This workshop the second workshop designed for the Cancer MSc Students in UCL Cancer Institute to gain some confidence on using R (statistical-) programming language in their MSc projects. I would appreciate if you participate in this pre-course survey (once again) so that I know of your expectation from today’s workshop.
After failing twice in my driving practical test, I took some time off from driving lessons. The reason was, partly, financial. It was also the beginning of the shorter days of winter and I was a bit worried of taking the exam during those cold days when the roads in Edinburgh became a bit tricky to drive. In the next summer, when I tried to contact my driving instructor again, to my surprise, I came to know that he had changed his profession. Well, my wife still blames me in secret as it was not the first time that my driving instructors had stopped training people or changed his career (though my first instructor took a break due to some family responsibilities).
So, I went to the third driving instructor, Bill. He was in his late 70s, I guess, and initially I had hard time understand what he was saying. You may think that it’s not ideal at all for a driving lesson. But interestingly enough, it worked out in the end and I got my driving license this time. Anyway, Bill used to be an engineer and after his retirement, he started his second career as a driving instructor. On the first day, Bill told me to forget everything I had learned so far on driving. I was a bit shocked indeed with his condescending approach, but when he started the lesson it felt like he was teaching me the grammar of driving - how to control the clutch, how to read the mind the driver of an oncoming vehicle etc.
By now, you may have started to wander, what does Bill have anything
to do with you or this workshop? Bear with me. Each time I think about
two R packages, namely dpylr and ggplot2, they
remind me of Bill. In this workshop on exploratory data analysis using
R, we will learn the grammar of data manipulation and grammar
of graphics to draw fancy plots. What you have learned in the previous
workshop, was not even the tip of an iceberg of plotting with
R. Those function that we used were a bit rigid and you have
less control over your plots. But here, with the package
ggplot2, we will shape up the plots as we wish (at least,
to a greater degree). We will draw layer upon layer to incorporate so
many aspects of the data in a single plot. And for the data handling, we
will use dplyr package. We will add layer of functions, as
we progress, to build our data structure for downstream analysis. And as
a whole, we will try to tell a story with our plots during the
workshop.
This is the second and the last workshop in this series for this year. In the first workshop, I introduced very basic functions of R for data handling and generating basic plots. Some of you wandered about the utility of R in your biological research in the coming months. That might be due to my choice of very simple datasets that came with base R. I was consciously avoiding a bit complex real-life datasets (especially related to molecular biology or omics) so that you (at least ~70% of you) don’t become startled with it while encountering R syntaxes for the first time.
In the first half of today’s workshop, we will learn a more efficient
way of handling / manipulating data using an R package called
dplyr and generate plot using another package called
ggplot2. However, we will still be using in-built data from
base R. Don’t be disheartened; soon we will shift our attention
to real-life / clinical data. We will be using few datasets that were
part of a study called METABRIC (Molecular Taxonomy of Breast Cancer
International Consortium). These datasets characterise the genomic
mutations (SNVs and CNAs) and gene expression profiles
from over 2000 primary breast tumours. In addition, a detailed
clinical information can also be found for this study alongside the
experimental data from cBioPortal,
which we will integrate to the latter. You can follow the little
download sign on that page or you can click here
to download the dataset. Save the brca_metabric.tar.gz file
to somewhere on your computer and decompress it. We will import some of
the files from here.
In this workshop, we are not planning to do any major data analysis, rather we will stick to the realm of (the fancy name) Exploratory data analysis (EDA) by formatting data and plotting some informative plots. We will learn few but important functions (or, verbs) to perform data manipulation. We will find out which was the most prominent among different mutation types. We will also generate a word cloud using most affected genes in the patient cohort.
We will see the expression of GATA3 transcription factor
in PAM50 classified samples or samples with different
ER status. We will also see the age distribution of the
patients for some selected mutated genes. Lastly, we will explore the
concept of co-occurrence of mutations among some cancer related genes in
the METABRIC cohort.
dplyrTrust me, this is the part of my research where I spend a significant portion of my time. Real-life data are not polished and nicely annotated. Moreover, when you want to integrate data from different sources, the fun begins (I am showing the quotation finger, of course)! Moreover, you need to format the output from one process and make it worthy for the next one. So, there’s no escape from formatting / manipulating data in real-life.
Here, we will be using the dplyr package which is one of
the most powerful and popular packages in R. The d
here stands for data and plyr is supposed to be the tool
plier. Therefore, dplyr packages refers to a tool to
manipulate data(-frame). dplyr provides a
grammar of data manipulation and the functions it provides
are regarded as the verbs in the code and are very
efficient ones in solving most common data manipulation problems. It is
sometimes arguably more efficient than the base R
operations.
There are mainly two ways to install dplyr package in
R. You can install the tidyverse package and
dplyr, being a part of it, will automatically be installed
in your R environment.
install.packages("tidyverse")
Or, you can install just the dplyr package by -
install.packages("dplyr")
However, if you want to install the development version, which I won’t recommend at this stage, you can follow the codes below -
if (packageVersion("devtools") < 1.6) {
install.packages("devtools")
}
devtools::install_github("hadley/lazyeval")
devtools::install_github("hadley/dplyr")
And, now load it …
library(dplyr)
It will be a crime not to introduce the pipe operator
%>% to you before starting with dplyr
verbs. If you are familiar with the pipe operator | in bash
scripting, that’s it. I have no better way to describe it to you. But,
if you are not, then here is the thing for you -
The pipe operator %>% connects two operations on the
same data (be it a vector or a data-frame). It passes the output from
the left-hand side operation of it as the first argument to the
right-hand side operation. If you want a formal definition:
x %>% f(y) is converted into f(x,y) by
using the pipe operator.
Let’s look at a example. Say, we have a vector x that
holds value from 1 to 100 and we want to calculate the mean
of x and make it round to an integer, we write
in base R -
x <- 1:100
round(mean(x))
## [1] 50
On the other hand, using the pipe operator, we can first define the
x and then calculate the mean and, at the end,
round it to an integer, like -
x <- 1:100
x %>% mean %>% round
## [1] 50
It goes from left to right as we think and build our data analysis
pipeline. The new version of dplyr also supports
|> as the pipe operator, but I will stick to
%>% in the workshop.
There are many verbs embedded in the dplyr package. Here
I will be discussing a few (but very important ones) that you will need
to resolve most of the data manipulation challenges in your day-to-day
life.
select() picks variables based on their names or types.
For example -
# using specific variable names -
iris %>%
select(Sepal.Length, Sepal.Width)
| Sepal.Length | Sepal.Width |
|---|---|
| 5.1 | 3.5 |
| 4.9 | 3.0 |
| 4.7 | 3.2 |
| 4.6 | 3.1 |
| 5.0 | 3.6 |
| 5.4 | 3.9 |
| 4.6 | 3.4 |
| 5.0 | 3.4 |
| 4.4 | 2.9 |
| 4.9 | 3.1 |
| 5.4 | 3.7 |
| 4.8 | 3.4 |
| 4.8 | 3.0 |
| 4.3 | 3.0 |
| 5.8 | 4.0 |
| 5.7 | 4.4 |
| 5.4 | 3.9 |
| 5.1 | 3.5 |
| 5.7 | 3.8 |
| 5.1 | 3.8 |
| 5.4 | 3.4 |
| 5.1 | 3.7 |
| 4.6 | 3.6 |
| 5.1 | 3.3 |
| 4.8 | 3.4 |
| 5.0 | 3.0 |
| 5.0 | 3.4 |
| 5.2 | 3.5 |
| 5.2 | 3.4 |
| 4.7 | 3.2 |
| 4.8 | 3.1 |
| 5.4 | 3.4 |
| 5.2 | 4.1 |
| 5.5 | 4.2 |
| 4.9 | 3.1 |
| 5.0 | 3.2 |
| 5.5 | 3.5 |
| 4.9 | 3.6 |
| 4.4 | 3.0 |
| 5.1 | 3.4 |
| 5.0 | 3.5 |
| 4.5 | 2.3 |
| 4.4 | 3.2 |
| 5.0 | 3.5 |
| 5.1 | 3.8 |
| 4.8 | 3.0 |
| 5.1 | 3.8 |
| 4.6 | 3.2 |
| 5.3 | 3.7 |
| 5.0 | 3.3 |
| 7.0 | 3.2 |
| 6.4 | 3.2 |
| 6.9 | 3.1 |
| 5.5 | 2.3 |
| 6.5 | 2.8 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 4.9 | 2.4 |
| 6.6 | 2.9 |
| 5.2 | 2.7 |
| 5.0 | 2.0 |
| 5.9 | 3.0 |
| 6.0 | 2.2 |
| 6.1 | 2.9 |
| 5.6 | 2.9 |
| 6.7 | 3.1 |
| 5.6 | 3.0 |
| 5.8 | 2.7 |
| 6.2 | 2.2 |
| 5.6 | 2.5 |
| 5.9 | 3.2 |
| 6.1 | 2.8 |
| 6.3 | 2.5 |
| 6.1 | 2.8 |
| 6.4 | 2.9 |
| 6.6 | 3.0 |
| 6.8 | 2.8 |
| 6.7 | 3.0 |
| 6.0 | 2.9 |
| 5.7 | 2.6 |
| 5.5 | 2.4 |
| 5.5 | 2.4 |
| 5.8 | 2.7 |
| 6.0 | 2.7 |
| 5.4 | 3.0 |
| 6.0 | 3.4 |
| 6.7 | 3.1 |
| 6.3 | 2.3 |
| 5.6 | 3.0 |
| 5.5 | 2.5 |
| 5.5 | 2.6 |
| 6.1 | 3.0 |
| 5.8 | 2.6 |
| 5.0 | 2.3 |
| 5.6 | 2.7 |
| 5.7 | 3.0 |
| 5.7 | 2.9 |
| 6.2 | 2.9 |
| 5.1 | 2.5 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 5.8 | 2.7 |
| 7.1 | 3.0 |
| 6.3 | 2.9 |
| 6.5 | 3.0 |
| 7.6 | 3.0 |
| 4.9 | 2.5 |
| 7.3 | 2.9 |
| 6.7 | 2.5 |
| 7.2 | 3.6 |
| 6.5 | 3.2 |
| 6.4 | 2.7 |
| 6.8 | 3.0 |
| 5.7 | 2.5 |
| 5.8 | 2.8 |
| 6.4 | 3.2 |
| 6.5 | 3.0 |
| 7.7 | 3.8 |
| 7.7 | 2.6 |
| 6.0 | 2.2 |
| 6.9 | 3.2 |
| 5.6 | 2.8 |
| 7.7 | 2.8 |
| 6.3 | 2.7 |
| 6.7 | 3.3 |
| 7.2 | 3.2 |
| 6.2 | 2.8 |
| 6.1 | 3.0 |
| 6.4 | 2.8 |
| 7.2 | 3.0 |
| 7.4 | 2.8 |
| 7.9 | 3.8 |
| 6.4 | 2.8 |
| 6.3 | 2.8 |
| 6.1 | 2.6 |
| 7.7 | 3.0 |
| 6.3 | 3.4 |
| 6.4 | 3.1 |
| 6.0 | 3.0 |
| 6.9 | 3.1 |
| 6.7 | 3.1 |
| 6.9 | 3.1 |
| 5.8 | 2.7 |
| 6.8 | 3.2 |
| 6.7 | 3.3 |
| 6.7 | 3.0 |
| 6.3 | 2.5 |
| 6.5 | 3.0 |
| 6.2 | 3.4 |
| 5.9 | 3.0 |
# using type -
iris %>%
select(is.numeric)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width |
|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 |
| 4.9 | 3.0 | 1.4 | 0.2 |
| 4.7 | 3.2 | 1.3 | 0.2 |
| 4.6 | 3.1 | 1.5 | 0.2 |
| 5.0 | 3.6 | 1.4 | 0.2 |
| 5.4 | 3.9 | 1.7 | 0.4 |
| 4.6 | 3.4 | 1.4 | 0.3 |
| 5.0 | 3.4 | 1.5 | 0.2 |
| 4.4 | 2.9 | 1.4 | 0.2 |
| 4.9 | 3.1 | 1.5 | 0.1 |
| 5.4 | 3.7 | 1.5 | 0.2 |
| 4.8 | 3.4 | 1.6 | 0.2 |
| 4.8 | 3.0 | 1.4 | 0.1 |
| 4.3 | 3.0 | 1.1 | 0.1 |
| 5.8 | 4.0 | 1.2 | 0.2 |
| 5.7 | 4.4 | 1.5 | 0.4 |
| 5.4 | 3.9 | 1.3 | 0.4 |
| 5.1 | 3.5 | 1.4 | 0.3 |
| 5.7 | 3.8 | 1.7 | 0.3 |
| 5.1 | 3.8 | 1.5 | 0.3 |
| 5.4 | 3.4 | 1.7 | 0.2 |
| 5.1 | 3.7 | 1.5 | 0.4 |
| 4.6 | 3.6 | 1.0 | 0.2 |
| 5.1 | 3.3 | 1.7 | 0.5 |
| 4.8 | 3.4 | 1.9 | 0.2 |
| 5.0 | 3.0 | 1.6 | 0.2 |
| 5.0 | 3.4 | 1.6 | 0.4 |
| 5.2 | 3.5 | 1.5 | 0.2 |
| 5.2 | 3.4 | 1.4 | 0.2 |
| 4.7 | 3.2 | 1.6 | 0.2 |
| 4.8 | 3.1 | 1.6 | 0.2 |
| 5.4 | 3.4 | 1.5 | 0.4 |
| 5.2 | 4.1 | 1.5 | 0.1 |
| 5.5 | 4.2 | 1.4 | 0.2 |
| 4.9 | 3.1 | 1.5 | 0.2 |
| 5.0 | 3.2 | 1.2 | 0.2 |
| 5.5 | 3.5 | 1.3 | 0.2 |
| 4.9 | 3.6 | 1.4 | 0.1 |
| 4.4 | 3.0 | 1.3 | 0.2 |
| 5.1 | 3.4 | 1.5 | 0.2 |
| 5.0 | 3.5 | 1.3 | 0.3 |
| 4.5 | 2.3 | 1.3 | 0.3 |
| 4.4 | 3.2 | 1.3 | 0.2 |
| 5.0 | 3.5 | 1.6 | 0.6 |
| 5.1 | 3.8 | 1.9 | 0.4 |
| 4.8 | 3.0 | 1.4 | 0.3 |
| 5.1 | 3.8 | 1.6 | 0.2 |
| 4.6 | 3.2 | 1.4 | 0.2 |
| 5.3 | 3.7 | 1.5 | 0.2 |
| 5.0 | 3.3 | 1.4 | 0.2 |
| 7.0 | 3.2 | 4.7 | 1.4 |
| 6.4 | 3.2 | 4.5 | 1.5 |
| 6.9 | 3.1 | 4.9 | 1.5 |
| 5.5 | 2.3 | 4.0 | 1.3 |
| 6.5 | 2.8 | 4.6 | 1.5 |
| 5.7 | 2.8 | 4.5 | 1.3 |
| 6.3 | 3.3 | 4.7 | 1.6 |
| 4.9 | 2.4 | 3.3 | 1.0 |
| 6.6 | 2.9 | 4.6 | 1.3 |
| 5.2 | 2.7 | 3.9 | 1.4 |
| 5.0 | 2.0 | 3.5 | 1.0 |
| 5.9 | 3.0 | 4.2 | 1.5 |
| 6.0 | 2.2 | 4.0 | 1.0 |
| 6.1 | 2.9 | 4.7 | 1.4 |
| 5.6 | 2.9 | 3.6 | 1.3 |
| 6.7 | 3.1 | 4.4 | 1.4 |
| 5.6 | 3.0 | 4.5 | 1.5 |
| 5.8 | 2.7 | 4.1 | 1.0 |
| 6.2 | 2.2 | 4.5 | 1.5 |
| 5.6 | 2.5 | 3.9 | 1.1 |
| 5.9 | 3.2 | 4.8 | 1.8 |
| 6.1 | 2.8 | 4.0 | 1.3 |
| 6.3 | 2.5 | 4.9 | 1.5 |
| 6.1 | 2.8 | 4.7 | 1.2 |
| 6.4 | 2.9 | 4.3 | 1.3 |
| 6.6 | 3.0 | 4.4 | 1.4 |
| 6.8 | 2.8 | 4.8 | 1.4 |
| 6.7 | 3.0 | 5.0 | 1.7 |
| 6.0 | 2.9 | 4.5 | 1.5 |
| 5.7 | 2.6 | 3.5 | 1.0 |
| 5.5 | 2.4 | 3.8 | 1.1 |
| 5.5 | 2.4 | 3.7 | 1.0 |
| 5.8 | 2.7 | 3.9 | 1.2 |
| 6.0 | 2.7 | 5.1 | 1.6 |
| 5.4 | 3.0 | 4.5 | 1.5 |
| 6.0 | 3.4 | 4.5 | 1.6 |
| 6.7 | 3.1 | 4.7 | 1.5 |
| 6.3 | 2.3 | 4.4 | 1.3 |
| 5.6 | 3.0 | 4.1 | 1.3 |
| 5.5 | 2.5 | 4.0 | 1.3 |
| 5.5 | 2.6 | 4.4 | 1.2 |
| 6.1 | 3.0 | 4.6 | 1.4 |
| 5.8 | 2.6 | 4.0 | 1.2 |
| 5.0 | 2.3 | 3.3 | 1.0 |
| 5.6 | 2.7 | 4.2 | 1.3 |
| 5.7 | 3.0 | 4.2 | 1.2 |
| 5.7 | 2.9 | 4.2 | 1.3 |
| 6.2 | 2.9 | 4.3 | 1.3 |
| 5.1 | 2.5 | 3.0 | 1.1 |
| 5.7 | 2.8 | 4.1 | 1.3 |
| 6.3 | 3.3 | 6.0 | 2.5 |
| 5.8 | 2.7 | 5.1 | 1.9 |
| 7.1 | 3.0 | 5.9 | 2.1 |
| 6.3 | 2.9 | 5.6 | 1.8 |
| 6.5 | 3.0 | 5.8 | 2.2 |
| 7.6 | 3.0 | 6.6 | 2.1 |
| 4.9 | 2.5 | 4.5 | 1.7 |
| 7.3 | 2.9 | 6.3 | 1.8 |
| 6.7 | 2.5 | 5.8 | 1.8 |
| 7.2 | 3.6 | 6.1 | 2.5 |
| 6.5 | 3.2 | 5.1 | 2.0 |
| 6.4 | 2.7 | 5.3 | 1.9 |
| 6.8 | 3.0 | 5.5 | 2.1 |
| 5.7 | 2.5 | 5.0 | 2.0 |
| 5.8 | 2.8 | 5.1 | 2.4 |
| 6.4 | 3.2 | 5.3 | 2.3 |
| 6.5 | 3.0 | 5.5 | 1.8 |
| 7.7 | 3.8 | 6.7 | 2.2 |
| 7.7 | 2.6 | 6.9 | 2.3 |
| 6.0 | 2.2 | 5.0 | 1.5 |
| 6.9 | 3.2 | 5.7 | 2.3 |
| 5.6 | 2.8 | 4.9 | 2.0 |
| 7.7 | 2.8 | 6.7 | 2.0 |
| 6.3 | 2.7 | 4.9 | 1.8 |
| 6.7 | 3.3 | 5.7 | 2.1 |
| 7.2 | 3.2 | 6.0 | 1.8 |
| 6.2 | 2.8 | 4.8 | 1.8 |
| 6.1 | 3.0 | 4.9 | 1.8 |
| 6.4 | 2.8 | 5.6 | 2.1 |
| 7.2 | 3.0 | 5.8 | 1.6 |
| 7.4 | 2.8 | 6.1 | 1.9 |
| 7.9 | 3.8 | 6.4 | 2.0 |
| 6.4 | 2.8 | 5.6 | 2.2 |
| 6.3 | 2.8 | 5.1 | 1.5 |
| 6.1 | 2.6 | 5.6 | 1.4 |
| 7.7 | 3.0 | 6.1 | 2.3 |
| 6.3 | 3.4 | 5.6 | 2.4 |
| 6.4 | 3.1 | 5.5 | 1.8 |
| 6.0 | 3.0 | 4.8 | 1.8 |
| 6.9 | 3.1 | 5.4 | 2.1 |
| 6.7 | 3.1 | 5.6 | 2.4 |
| 6.9 | 3.1 | 5.1 | 2.3 |
| 5.8 | 2.7 | 5.1 | 1.9 |
| 6.8 | 3.2 | 5.9 | 2.3 |
| 6.7 | 3.3 | 5.7 | 2.5 |
| 6.7 | 3.0 | 5.2 | 2.3 |
| 6.3 | 2.5 | 5.0 | 1.9 |
| 6.5 | 3.0 | 5.2 | 2.0 |
| 6.2 | 3.4 | 5.4 | 2.3 |
| 5.9 | 3.0 | 5.1 | 1.8 |
With the verb select(), comes some selection
helpers -
If you want to select all the variables, you can use
everything()
iris %>%
select(everything())
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
You can choose the last column using last_col() or
only columns that are grouped using group_cols() (You will
understand better when I discuss the group_by() verb
later).
# select the last column
iris %>%
select(last_col())
| Species |
|---|
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| setosa |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| versicolor |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
| virginica |
# select the grouped column(s)
iris %>%
group_by(Sepal.Length,Sepal.Width) %>%
select(group_cols())
| Sepal.Length | Sepal.Width |
|---|---|
| 5.1 | 3.5 |
| 4.9 | 3.0 |
| 4.7 | 3.2 |
| 4.6 | 3.1 |
| 5.0 | 3.6 |
| 5.4 | 3.9 |
| 4.6 | 3.4 |
| 5.0 | 3.4 |
| 4.4 | 2.9 |
| 4.9 | 3.1 |
| 5.4 | 3.7 |
| 4.8 | 3.4 |
| 4.8 | 3.0 |
| 4.3 | 3.0 |
| 5.8 | 4.0 |
| 5.7 | 4.4 |
| 5.4 | 3.9 |
| 5.1 | 3.5 |
| 5.7 | 3.8 |
| 5.1 | 3.8 |
| 5.4 | 3.4 |
| 5.1 | 3.7 |
| 4.6 | 3.6 |
| 5.1 | 3.3 |
| 4.8 | 3.4 |
| 5.0 | 3.0 |
| 5.0 | 3.4 |
| 5.2 | 3.5 |
| 5.2 | 3.4 |
| 4.7 | 3.2 |
| 4.8 | 3.1 |
| 5.4 | 3.4 |
| 5.2 | 4.1 |
| 5.5 | 4.2 |
| 4.9 | 3.1 |
| 5.0 | 3.2 |
| 5.5 | 3.5 |
| 4.9 | 3.6 |
| 4.4 | 3.0 |
| 5.1 | 3.4 |
| 5.0 | 3.5 |
| 4.5 | 2.3 |
| 4.4 | 3.2 |
| 5.0 | 3.5 |
| 5.1 | 3.8 |
| 4.8 | 3.0 |
| 5.1 | 3.8 |
| 4.6 | 3.2 |
| 5.3 | 3.7 |
| 5.0 | 3.3 |
| 7.0 | 3.2 |
| 6.4 | 3.2 |
| 6.9 | 3.1 |
| 5.5 | 2.3 |
| 6.5 | 2.8 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 4.9 | 2.4 |
| 6.6 | 2.9 |
| 5.2 | 2.7 |
| 5.0 | 2.0 |
| 5.9 | 3.0 |
| 6.0 | 2.2 |
| 6.1 | 2.9 |
| 5.6 | 2.9 |
| 6.7 | 3.1 |
| 5.6 | 3.0 |
| 5.8 | 2.7 |
| 6.2 | 2.2 |
| 5.6 | 2.5 |
| 5.9 | 3.2 |
| 6.1 | 2.8 |
| 6.3 | 2.5 |
| 6.1 | 2.8 |
| 6.4 | 2.9 |
| 6.6 | 3.0 |
| 6.8 | 2.8 |
| 6.7 | 3.0 |
| 6.0 | 2.9 |
| 5.7 | 2.6 |
| 5.5 | 2.4 |
| 5.5 | 2.4 |
| 5.8 | 2.7 |
| 6.0 | 2.7 |
| 5.4 | 3.0 |
| 6.0 | 3.4 |
| 6.7 | 3.1 |
| 6.3 | 2.3 |
| 5.6 | 3.0 |
| 5.5 | 2.5 |
| 5.5 | 2.6 |
| 6.1 | 3.0 |
| 5.8 | 2.6 |
| 5.0 | 2.3 |
| 5.6 | 2.7 |
| 5.7 | 3.0 |
| 5.7 | 2.9 |
| 6.2 | 2.9 |
| 5.1 | 2.5 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 5.8 | 2.7 |
| 7.1 | 3.0 |
| 6.3 | 2.9 |
| 6.5 | 3.0 |
| 7.6 | 3.0 |
| 4.9 | 2.5 |
| 7.3 | 2.9 |
| 6.7 | 2.5 |
| 7.2 | 3.6 |
| 6.5 | 3.2 |
| 6.4 | 2.7 |
| 6.8 | 3.0 |
| 5.7 | 2.5 |
| 5.8 | 2.8 |
| 6.4 | 3.2 |
| 6.5 | 3.0 |
| 7.7 | 3.8 |
| 7.7 | 2.6 |
| 6.0 | 2.2 |
| 6.9 | 3.2 |
| 5.6 | 2.8 |
| 7.7 | 2.8 |
| 6.3 | 2.7 |
| 6.7 | 3.3 |
| 7.2 | 3.2 |
| 6.2 | 2.8 |
| 6.1 | 3.0 |
| 6.4 | 2.8 |
| 7.2 | 3.0 |
| 7.4 | 2.8 |
| 7.9 | 3.8 |
| 6.4 | 2.8 |
| 6.3 | 2.8 |
| 6.1 | 2.6 |
| 7.7 | 3.0 |
| 6.3 | 3.4 |
| 6.4 | 3.1 |
| 6.0 | 3.0 |
| 6.9 | 3.1 |
| 6.7 | 3.1 |
| 6.9 | 3.1 |
| 5.8 | 2.7 |
| 6.8 | 3.2 |
| 6.7 | 3.3 |
| 6.7 | 3.0 |
| 6.3 | 2.5 |
| 6.5 | 3.0 |
| 6.2 | 3.4 |
| 5.9 | 3.0 |
If there’s a common prefix or suffix to some column names, you
can utilise that by using selection helpers starts_with()
or ends_with(), respectively -
# starts_with()
iris %>%
select(starts_with("Sepal"))
| Sepal.Length | Sepal.Width |
|---|---|
| 5.1 | 3.5 |
| 4.9 | 3.0 |
| 4.7 | 3.2 |
| 4.6 | 3.1 |
| 5.0 | 3.6 |
| 5.4 | 3.9 |
| 4.6 | 3.4 |
| 5.0 | 3.4 |
| 4.4 | 2.9 |
| 4.9 | 3.1 |
| 5.4 | 3.7 |
| 4.8 | 3.4 |
| 4.8 | 3.0 |
| 4.3 | 3.0 |
| 5.8 | 4.0 |
| 5.7 | 4.4 |
| 5.4 | 3.9 |
| 5.1 | 3.5 |
| 5.7 | 3.8 |
| 5.1 | 3.8 |
| 5.4 | 3.4 |
| 5.1 | 3.7 |
| 4.6 | 3.6 |
| 5.1 | 3.3 |
| 4.8 | 3.4 |
| 5.0 | 3.0 |
| 5.0 | 3.4 |
| 5.2 | 3.5 |
| 5.2 | 3.4 |
| 4.7 | 3.2 |
| 4.8 | 3.1 |
| 5.4 | 3.4 |
| 5.2 | 4.1 |
| 5.5 | 4.2 |
| 4.9 | 3.1 |
| 5.0 | 3.2 |
| 5.5 | 3.5 |
| 4.9 | 3.6 |
| 4.4 | 3.0 |
| 5.1 | 3.4 |
| 5.0 | 3.5 |
| 4.5 | 2.3 |
| 4.4 | 3.2 |
| 5.0 | 3.5 |
| 5.1 | 3.8 |
| 4.8 | 3.0 |
| 5.1 | 3.8 |
| 4.6 | 3.2 |
| 5.3 | 3.7 |
| 5.0 | 3.3 |
| 7.0 | 3.2 |
| 6.4 | 3.2 |
| 6.9 | 3.1 |
| 5.5 | 2.3 |
| 6.5 | 2.8 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 4.9 | 2.4 |
| 6.6 | 2.9 |
| 5.2 | 2.7 |
| 5.0 | 2.0 |
| 5.9 | 3.0 |
| 6.0 | 2.2 |
| 6.1 | 2.9 |
| 5.6 | 2.9 |
| 6.7 | 3.1 |
| 5.6 | 3.0 |
| 5.8 | 2.7 |
| 6.2 | 2.2 |
| 5.6 | 2.5 |
| 5.9 | 3.2 |
| 6.1 | 2.8 |
| 6.3 | 2.5 |
| 6.1 | 2.8 |
| 6.4 | 2.9 |
| 6.6 | 3.0 |
| 6.8 | 2.8 |
| 6.7 | 3.0 |
| 6.0 | 2.9 |
| 5.7 | 2.6 |
| 5.5 | 2.4 |
| 5.5 | 2.4 |
| 5.8 | 2.7 |
| 6.0 | 2.7 |
| 5.4 | 3.0 |
| 6.0 | 3.4 |
| 6.7 | 3.1 |
| 6.3 | 2.3 |
| 5.6 | 3.0 |
| 5.5 | 2.5 |
| 5.5 | 2.6 |
| 6.1 | 3.0 |
| 5.8 | 2.6 |
| 5.0 | 2.3 |
| 5.6 | 2.7 |
| 5.7 | 3.0 |
| 5.7 | 2.9 |
| 6.2 | 2.9 |
| 5.1 | 2.5 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 5.8 | 2.7 |
| 7.1 | 3.0 |
| 6.3 | 2.9 |
| 6.5 | 3.0 |
| 7.6 | 3.0 |
| 4.9 | 2.5 |
| 7.3 | 2.9 |
| 6.7 | 2.5 |
| 7.2 | 3.6 |
| 6.5 | 3.2 |
| 6.4 | 2.7 |
| 6.8 | 3.0 |
| 5.7 | 2.5 |
| 5.8 | 2.8 |
| 6.4 | 3.2 |
| 6.5 | 3.0 |
| 7.7 | 3.8 |
| 7.7 | 2.6 |
| 6.0 | 2.2 |
| 6.9 | 3.2 |
| 5.6 | 2.8 |
| 7.7 | 2.8 |
| 6.3 | 2.7 |
| 6.7 | 3.3 |
| 7.2 | 3.2 |
| 6.2 | 2.8 |
| 6.1 | 3.0 |
| 6.4 | 2.8 |
| 7.2 | 3.0 |
| 7.4 | 2.8 |
| 7.9 | 3.8 |
| 6.4 | 2.8 |
| 6.3 | 2.8 |
| 6.1 | 2.6 |
| 7.7 | 3.0 |
| 6.3 | 3.4 |
| 6.4 | 3.1 |
| 6.0 | 3.0 |
| 6.9 | 3.1 |
| 6.7 | 3.1 |
| 6.9 | 3.1 |
| 5.8 | 2.7 |
| 6.8 | 3.2 |
| 6.7 | 3.3 |
| 6.7 | 3.0 |
| 6.3 | 2.5 |
| 6.5 | 3.0 |
| 6.2 | 3.4 |
| 5.9 | 3.0 |
# ends_with()
iris %>%
select(ends_with("Length"))
| Sepal.Length | Petal.Length |
|---|---|
| 5.1 | 1.4 |
| 4.9 | 1.4 |
| 4.7 | 1.3 |
| 4.6 | 1.5 |
| 5.0 | 1.4 |
| 5.4 | 1.7 |
| 4.6 | 1.4 |
| 5.0 | 1.5 |
| 4.4 | 1.4 |
| 4.9 | 1.5 |
| 5.4 | 1.5 |
| 4.8 | 1.6 |
| 4.8 | 1.4 |
| 4.3 | 1.1 |
| 5.8 | 1.2 |
| 5.7 | 1.5 |
| 5.4 | 1.3 |
| 5.1 | 1.4 |
| 5.7 | 1.7 |
| 5.1 | 1.5 |
| 5.4 | 1.7 |
| 5.1 | 1.5 |
| 4.6 | 1.0 |
| 5.1 | 1.7 |
| 4.8 | 1.9 |
| 5.0 | 1.6 |
| 5.0 | 1.6 |
| 5.2 | 1.5 |
| 5.2 | 1.4 |
| 4.7 | 1.6 |
| 4.8 | 1.6 |
| 5.4 | 1.5 |
| 5.2 | 1.5 |
| 5.5 | 1.4 |
| 4.9 | 1.5 |
| 5.0 | 1.2 |
| 5.5 | 1.3 |
| 4.9 | 1.4 |
| 4.4 | 1.3 |
| 5.1 | 1.5 |
| 5.0 | 1.3 |
| 4.5 | 1.3 |
| 4.4 | 1.3 |
| 5.0 | 1.6 |
| 5.1 | 1.9 |
| 4.8 | 1.4 |
| 5.1 | 1.6 |
| 4.6 | 1.4 |
| 5.3 | 1.5 |
| 5.0 | 1.4 |
| 7.0 | 4.7 |
| 6.4 | 4.5 |
| 6.9 | 4.9 |
| 5.5 | 4.0 |
| 6.5 | 4.6 |
| 5.7 | 4.5 |
| 6.3 | 4.7 |
| 4.9 | 3.3 |
| 6.6 | 4.6 |
| 5.2 | 3.9 |
| 5.0 | 3.5 |
| 5.9 | 4.2 |
| 6.0 | 4.0 |
| 6.1 | 4.7 |
| 5.6 | 3.6 |
| 6.7 | 4.4 |
| 5.6 | 4.5 |
| 5.8 | 4.1 |
| 6.2 | 4.5 |
| 5.6 | 3.9 |
| 5.9 | 4.8 |
| 6.1 | 4.0 |
| 6.3 | 4.9 |
| 6.1 | 4.7 |
| 6.4 | 4.3 |
| 6.6 | 4.4 |
| 6.8 | 4.8 |
| 6.7 | 5.0 |
| 6.0 | 4.5 |
| 5.7 | 3.5 |
| 5.5 | 3.8 |
| 5.5 | 3.7 |
| 5.8 | 3.9 |
| 6.0 | 5.1 |
| 5.4 | 4.5 |
| 6.0 | 4.5 |
| 6.7 | 4.7 |
| 6.3 | 4.4 |
| 5.6 | 4.1 |
| 5.5 | 4.0 |
| 5.5 | 4.4 |
| 6.1 | 4.6 |
| 5.8 | 4.0 |
| 5.0 | 3.3 |
| 5.6 | 4.2 |
| 5.7 | 4.2 |
| 5.7 | 4.2 |
| 6.2 | 4.3 |
| 5.1 | 3.0 |
| 5.7 | 4.1 |
| 6.3 | 6.0 |
| 5.8 | 5.1 |
| 7.1 | 5.9 |
| 6.3 | 5.6 |
| 6.5 | 5.8 |
| 7.6 | 6.6 |
| 4.9 | 4.5 |
| 7.3 | 6.3 |
| 6.7 | 5.8 |
| 7.2 | 6.1 |
| 6.5 | 5.1 |
| 6.4 | 5.3 |
| 6.8 | 5.5 |
| 5.7 | 5.0 |
| 5.8 | 5.1 |
| 6.4 | 5.3 |
| 6.5 | 5.5 |
| 7.7 | 6.7 |
| 7.7 | 6.9 |
| 6.0 | 5.0 |
| 6.9 | 5.7 |
| 5.6 | 4.9 |
| 7.7 | 6.7 |
| 6.3 | 4.9 |
| 6.7 | 5.7 |
| 7.2 | 6.0 |
| 6.2 | 4.8 |
| 6.1 | 4.9 |
| 6.4 | 5.6 |
| 7.2 | 5.8 |
| 7.4 | 6.1 |
| 7.9 | 6.4 |
| 6.4 | 5.6 |
| 6.3 | 5.1 |
| 6.1 | 5.6 |
| 7.7 | 6.1 |
| 6.3 | 5.6 |
| 6.4 | 5.5 |
| 6.0 | 4.8 |
| 6.9 | 5.4 |
| 6.7 | 5.6 |
| 6.9 | 5.1 |
| 5.8 | 5.1 |
| 6.8 | 5.9 |
| 6.7 | 5.7 |
| 6.7 | 5.2 |
| 6.3 | 5.0 |
| 6.5 | 5.2 |
| 6.2 | 5.4 |
| 5.9 | 5.1 |
Even, an internal pattern can be used to select a column by
using contains() -
iris %>%
select(contains("dth"))
| Sepal.Width | Petal.Width |
|---|---|
| 3.5 | 0.2 |
| 3.0 | 0.2 |
| 3.2 | 0.2 |
| 3.1 | 0.2 |
| 3.6 | 0.2 |
| 3.9 | 0.4 |
| 3.4 | 0.3 |
| 3.4 | 0.2 |
| 2.9 | 0.2 |
| 3.1 | 0.1 |
| 3.7 | 0.2 |
| 3.4 | 0.2 |
| 3.0 | 0.1 |
| 3.0 | 0.1 |
| 4.0 | 0.2 |
| 4.4 | 0.4 |
| 3.9 | 0.4 |
| 3.5 | 0.3 |
| 3.8 | 0.3 |
| 3.8 | 0.3 |
| 3.4 | 0.2 |
| 3.7 | 0.4 |
| 3.6 | 0.2 |
| 3.3 | 0.5 |
| 3.4 | 0.2 |
| 3.0 | 0.2 |
| 3.4 | 0.4 |
| 3.5 | 0.2 |
| 3.4 | 0.2 |
| 3.2 | 0.2 |
| 3.1 | 0.2 |
| 3.4 | 0.4 |
| 4.1 | 0.1 |
| 4.2 | 0.2 |
| 3.1 | 0.2 |
| 3.2 | 0.2 |
| 3.5 | 0.2 |
| 3.6 | 0.1 |
| 3.0 | 0.2 |
| 3.4 | 0.2 |
| 3.5 | 0.3 |
| 2.3 | 0.3 |
| 3.2 | 0.2 |
| 3.5 | 0.6 |
| 3.8 | 0.4 |
| 3.0 | 0.3 |
| 3.8 | 0.2 |
| 3.2 | 0.2 |
| 3.7 | 0.2 |
| 3.3 | 0.2 |
| 3.2 | 1.4 |
| 3.2 | 1.5 |
| 3.1 | 1.5 |
| 2.3 | 1.3 |
| 2.8 | 1.5 |
| 2.8 | 1.3 |
| 3.3 | 1.6 |
| 2.4 | 1.0 |
| 2.9 | 1.3 |
| 2.7 | 1.4 |
| 2.0 | 1.0 |
| 3.0 | 1.5 |
| 2.2 | 1.0 |
| 2.9 | 1.4 |
| 2.9 | 1.3 |
| 3.1 | 1.4 |
| 3.0 | 1.5 |
| 2.7 | 1.0 |
| 2.2 | 1.5 |
| 2.5 | 1.1 |
| 3.2 | 1.8 |
| 2.8 | 1.3 |
| 2.5 | 1.5 |
| 2.8 | 1.2 |
| 2.9 | 1.3 |
| 3.0 | 1.4 |
| 2.8 | 1.4 |
| 3.0 | 1.7 |
| 2.9 | 1.5 |
| 2.6 | 1.0 |
| 2.4 | 1.1 |
| 2.4 | 1.0 |
| 2.7 | 1.2 |
| 2.7 | 1.6 |
| 3.0 | 1.5 |
| 3.4 | 1.6 |
| 3.1 | 1.5 |
| 2.3 | 1.3 |
| 3.0 | 1.3 |
| 2.5 | 1.3 |
| 2.6 | 1.2 |
| 3.0 | 1.4 |
| 2.6 | 1.2 |
| 2.3 | 1.0 |
| 2.7 | 1.3 |
| 3.0 | 1.2 |
| 2.9 | 1.3 |
| 2.9 | 1.3 |
| 2.5 | 1.1 |
| 2.8 | 1.3 |
| 3.3 | 2.5 |
| 2.7 | 1.9 |
| 3.0 | 2.1 |
| 2.9 | 1.8 |
| 3.0 | 2.2 |
| 3.0 | 2.1 |
| 2.5 | 1.7 |
| 2.9 | 1.8 |
| 2.5 | 1.8 |
| 3.6 | 2.5 |
| 3.2 | 2.0 |
| 2.7 | 1.9 |
| 3.0 | 2.1 |
| 2.5 | 2.0 |
| 2.8 | 2.4 |
| 3.2 | 2.3 |
| 3.0 | 1.8 |
| 3.8 | 2.2 |
| 2.6 | 2.3 |
| 2.2 | 1.5 |
| 3.2 | 2.3 |
| 2.8 | 2.0 |
| 2.8 | 2.0 |
| 2.7 | 1.8 |
| 3.3 | 2.1 |
| 3.2 | 1.8 |
| 2.8 | 1.8 |
| 3.0 | 1.8 |
| 2.8 | 2.1 |
| 3.0 | 1.6 |
| 2.8 | 1.9 |
| 3.8 | 2.0 |
| 2.8 | 2.2 |
| 2.8 | 1.5 |
| 2.6 | 1.4 |
| 3.0 | 2.3 |
| 3.4 | 2.4 |
| 3.1 | 1.8 |
| 3.0 | 1.8 |
| 3.1 | 2.1 |
| 3.1 | 2.4 |
| 3.1 | 2.3 |
| 2.7 | 1.9 |
| 3.2 | 2.3 |
| 3.3 | 2.5 |
| 3.0 | 2.3 |
| 2.5 | 1.9 |
| 3.0 | 2.0 |
| 3.4 | 2.3 |
| 3.0 | 1.8 |
Even, you can use regular expression to select a column by using
matches() -
# column name containing either W or d or both
iris %>%
select(matches("[Wd]"))
| Sepal.Width | Petal.Width |
|---|---|
| 3.5 | 0.2 |
| 3.0 | 0.2 |
| 3.2 | 0.2 |
| 3.1 | 0.2 |
| 3.6 | 0.2 |
| 3.9 | 0.4 |
| 3.4 | 0.3 |
| 3.4 | 0.2 |
| 2.9 | 0.2 |
| 3.1 | 0.1 |
| 3.7 | 0.2 |
| 3.4 | 0.2 |
| 3.0 | 0.1 |
| 3.0 | 0.1 |
| 4.0 | 0.2 |
| 4.4 | 0.4 |
| 3.9 | 0.4 |
| 3.5 | 0.3 |
| 3.8 | 0.3 |
| 3.8 | 0.3 |
| 3.4 | 0.2 |
| 3.7 | 0.4 |
| 3.6 | 0.2 |
| 3.3 | 0.5 |
| 3.4 | 0.2 |
| 3.0 | 0.2 |
| 3.4 | 0.4 |
| 3.5 | 0.2 |
| 3.4 | 0.2 |
| 3.2 | 0.2 |
| 3.1 | 0.2 |
| 3.4 | 0.4 |
| 4.1 | 0.1 |
| 4.2 | 0.2 |
| 3.1 | 0.2 |
| 3.2 | 0.2 |
| 3.5 | 0.2 |
| 3.6 | 0.1 |
| 3.0 | 0.2 |
| 3.4 | 0.2 |
| 3.5 | 0.3 |
| 2.3 | 0.3 |
| 3.2 | 0.2 |
| 3.5 | 0.6 |
| 3.8 | 0.4 |
| 3.0 | 0.3 |
| 3.8 | 0.2 |
| 3.2 | 0.2 |
| 3.7 | 0.2 |
| 3.3 | 0.2 |
| 3.2 | 1.4 |
| 3.2 | 1.5 |
| 3.1 | 1.5 |
| 2.3 | 1.3 |
| 2.8 | 1.5 |
| 2.8 | 1.3 |
| 3.3 | 1.6 |
| 2.4 | 1.0 |
| 2.9 | 1.3 |
| 2.7 | 1.4 |
| 2.0 | 1.0 |
| 3.0 | 1.5 |
| 2.2 | 1.0 |
| 2.9 | 1.4 |
| 2.9 | 1.3 |
| 3.1 | 1.4 |
| 3.0 | 1.5 |
| 2.7 | 1.0 |
| 2.2 | 1.5 |
| 2.5 | 1.1 |
| 3.2 | 1.8 |
| 2.8 | 1.3 |
| 2.5 | 1.5 |
| 2.8 | 1.2 |
| 2.9 | 1.3 |
| 3.0 | 1.4 |
| 2.8 | 1.4 |
| 3.0 | 1.7 |
| 2.9 | 1.5 |
| 2.6 | 1.0 |
| 2.4 | 1.1 |
| 2.4 | 1.0 |
| 2.7 | 1.2 |
| 2.7 | 1.6 |
| 3.0 | 1.5 |
| 3.4 | 1.6 |
| 3.1 | 1.5 |
| 2.3 | 1.3 |
| 3.0 | 1.3 |
| 2.5 | 1.3 |
| 2.6 | 1.2 |
| 3.0 | 1.4 |
| 2.6 | 1.2 |
| 2.3 | 1.0 |
| 2.7 | 1.3 |
| 3.0 | 1.2 |
| 2.9 | 1.3 |
| 2.9 | 1.3 |
| 2.5 | 1.1 |
| 2.8 | 1.3 |
| 3.3 | 2.5 |
| 2.7 | 1.9 |
| 3.0 | 2.1 |
| 2.9 | 1.8 |
| 3.0 | 2.2 |
| 3.0 | 2.1 |
| 2.5 | 1.7 |
| 2.9 | 1.8 |
| 2.5 | 1.8 |
| 3.6 | 2.5 |
| 3.2 | 2.0 |
| 2.7 | 1.9 |
| 3.0 | 2.1 |
| 2.5 | 2.0 |
| 2.8 | 2.4 |
| 3.2 | 2.3 |
| 3.0 | 1.8 |
| 3.8 | 2.2 |
| 2.6 | 2.3 |
| 2.2 | 1.5 |
| 3.2 | 2.3 |
| 2.8 | 2.0 |
| 2.8 | 2.0 |
| 2.7 | 1.8 |
| 3.3 | 2.1 |
| 3.2 | 1.8 |
| 2.8 | 1.8 |
| 3.0 | 1.8 |
| 2.8 | 2.1 |
| 3.0 | 1.6 |
| 2.8 | 1.9 |
| 3.8 | 2.0 |
| 2.8 | 2.2 |
| 2.8 | 1.5 |
| 2.6 | 1.4 |
| 3.0 | 2.3 |
| 3.4 | 2.4 |
| 3.1 | 1.8 |
| 3.0 | 1.8 |
| 3.1 | 2.1 |
| 3.1 | 2.4 |
| 3.1 | 2.3 |
| 2.7 | 1.9 |
| 3.2 | 2.3 |
| 3.3 | 2.5 |
| 3.0 | 2.3 |
| 2.5 | 1.9 |
| 3.0 | 2.0 |
| 3.4 | 2.3 |
| 3.0 | 1.8 |
The filter() verb is used to subset a data-frame based
on one or more conditions imposed on the row(s). Only the elements
(along the column) that satisfy the condition(s) remain and others
(along with the whole row) get filtered out. There are some functions
and operators that you should know while dealing with
filter() verb, like -
==, >, <, >=, <=
&, |, !
is.na()
%in%
Let’s see some examples -
# choose the rows whose Petal.Width is greater than 2
iris %>%
filter(Petal.Width > 2)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
# choose the rows for setosa Species
iris %>%
filter(Species == "setosa")
# filter(Species %in% "setosa")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
# or even the opposite is True
iris %>% filter(Species != "setosa")
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
The verb mutate() creates new columns and often the
element of the new column can be functions of the existing variables
(i.e. columns).
iris %>%
mutate(Length_difference = Sepal.Length - Petal.Length) # not that the new column here make much sense
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | Length_difference |
|---|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa | 3.7 |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa | 3.5 |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa | 3.4 |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa | 3.1 |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa | 3.6 |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa | 3.7 |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa | 3.2 |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa | 3.5 |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa | 3.0 |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa | 3.4 |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa | 3.9 |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa | 3.2 |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa | 3.4 |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa | 3.2 |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa | 4.6 |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa | 4.2 |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa | 4.1 |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa | 3.7 |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa | 4.0 |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa | 3.6 |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa | 3.7 |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa | 3.6 |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa | 3.6 |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa | 3.4 |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa | 2.9 |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa | 3.4 |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa | 3.4 |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa | 3.7 |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa | 3.8 |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa | 3.1 |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa | 3.2 |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa | 3.9 |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa | 3.7 |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa | 4.1 |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa | 3.4 |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa | 3.8 |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa | 4.2 |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa | 3.5 |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa | 3.1 |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa | 3.6 |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa | 3.7 |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa | 3.2 |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa | 3.1 |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa | 3.4 |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa | 3.2 |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa | 3.4 |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa | 3.5 |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa | 3.2 |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa | 3.8 |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa | 3.6 |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor | 2.3 |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor | 1.9 |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor | 2.0 |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor | 1.5 |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor | 1.9 |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor | 1.2 |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor | 1.6 |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor | 1.6 |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor | 2.0 |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor | 1.3 |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor | 1.5 |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor | 1.7 |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor | 2.0 |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor | 1.4 |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor | 2.0 |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor | 2.3 |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor | 1.1 |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor | 1.7 |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor | 1.7 |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor | 1.7 |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor | 1.1 |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor | 2.1 |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor | 1.4 |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor | 1.4 |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor | 2.1 |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor | 2.2 |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor | 2.0 |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor | 1.7 |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor | 1.5 |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor | 2.2 |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor | 1.7 |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor | 1.8 |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor | 1.9 |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor | 0.9 |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor | 0.9 |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor | 1.5 |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor | 2.0 |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor | 1.9 |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor | 1.5 |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor | 1.5 |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor | 1.1 |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor | 1.5 |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor | 1.8 |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor | 1.7 |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor | 1.4 |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor | 1.5 |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor | 1.5 |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor | 1.9 |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor | 2.1 |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor | 1.6 |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica | 0.3 |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica | 0.7 |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica | 1.2 |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica | 0.7 |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica | 0.7 |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica | 1.0 |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica | 0.4 |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica | 1.0 |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica | 0.9 |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica | 1.1 |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica | 1.4 |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica | 1.1 |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica | 1.3 |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica | 0.7 |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica | 0.7 |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica | 1.1 |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica | 1.0 |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica | 1.0 |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica | 0.8 |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica | 1.0 |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica | 1.2 |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica | 0.7 |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica | 1.0 |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica | 1.4 |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica | 1.0 |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica | 1.2 |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica | 1.4 |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica | 1.2 |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica | 0.8 |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica | 1.4 |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica | 1.3 |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica | 1.5 |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica | 0.8 |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica | 1.2 |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica | 0.5 |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica | 1.6 |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica | 0.7 |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica | 0.9 |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica | 1.2 |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica | 1.5 |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica | 1.1 |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica | 1.8 |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica | 0.7 |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica | 0.9 |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica | 1.0 |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica | 1.5 |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica | 1.3 |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica | 1.3 |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica | 0.8 |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica | 0.8 |
# To keep only the newly created column, use transmute()
iris %>%
transmute(Length_difference = Sepal.Length - Petal.Length)
| Length_difference |
|---|
| 3.7 |
| 3.5 |
| 3.4 |
| 3.1 |
| 3.6 |
| 3.7 |
| 3.2 |
| 3.5 |
| 3.0 |
| 3.4 |
| 3.9 |
| 3.2 |
| 3.4 |
| 3.2 |
| 4.6 |
| 4.2 |
| 4.1 |
| 3.7 |
| 4.0 |
| 3.6 |
| 3.7 |
| 3.6 |
| 3.6 |
| 3.4 |
| 2.9 |
| 3.4 |
| 3.4 |
| 3.7 |
| 3.8 |
| 3.1 |
| 3.2 |
| 3.9 |
| 3.7 |
| 4.1 |
| 3.4 |
| 3.8 |
| 4.2 |
| 3.5 |
| 3.1 |
| 3.6 |
| 3.7 |
| 3.2 |
| 3.1 |
| 3.4 |
| 3.2 |
| 3.4 |
| 3.5 |
| 3.2 |
| 3.8 |
| 3.6 |
| 2.3 |
| 1.9 |
| 2.0 |
| 1.5 |
| 1.9 |
| 1.2 |
| 1.6 |
| 1.6 |
| 2.0 |
| 1.3 |
| 1.5 |
| 1.7 |
| 2.0 |
| 1.4 |
| 2.0 |
| 2.3 |
| 1.1 |
| 1.7 |
| 1.7 |
| 1.7 |
| 1.1 |
| 2.1 |
| 1.4 |
| 1.4 |
| 2.1 |
| 2.2 |
| 2.0 |
| 1.7 |
| 1.5 |
| 2.2 |
| 1.7 |
| 1.8 |
| 1.9 |
| 0.9 |
| 0.9 |
| 1.5 |
| 2.0 |
| 1.9 |
| 1.5 |
| 1.5 |
| 1.1 |
| 1.5 |
| 1.8 |
| 1.7 |
| 1.4 |
| 1.5 |
| 1.5 |
| 1.9 |
| 2.1 |
| 1.6 |
| 0.3 |
| 0.7 |
| 1.2 |
| 0.7 |
| 0.7 |
| 1.0 |
| 0.4 |
| 1.0 |
| 0.9 |
| 1.1 |
| 1.4 |
| 1.1 |
| 1.3 |
| 0.7 |
| 0.7 |
| 1.1 |
| 1.0 |
| 1.0 |
| 0.8 |
| 1.0 |
| 1.2 |
| 0.7 |
| 1.0 |
| 1.4 |
| 1.0 |
| 1.2 |
| 1.4 |
| 1.2 |
| 0.8 |
| 1.4 |
| 1.3 |
| 1.5 |
| 0.8 |
| 1.2 |
| 0.5 |
| 1.6 |
| 0.7 |
| 0.9 |
| 1.2 |
| 1.5 |
| 1.1 |
| 1.8 |
| 0.7 |
| 0.9 |
| 1.0 |
| 1.5 |
| 1.3 |
| 1.3 |
| 0.8 |
| 0.8 |
Interestingly, setting the value of an existing column to
NULL inside mutate deletes the column.
As the name suggests, rename() verb changes the name of
an existing column. The syntax is
<new_name> = <old_name>. Example -
iris %>%
rename(Species.name=Species)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species.name |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
Interestingly, you can change the name of a column while
selecting using select() verb -
iris %>% select(Sepal.Length,
Sepal.Width,
Petal.Length,
Petal.Width,
Species.name=Species)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species.name |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
The verb arrange() arranges or orders the rows of a
data-frame by the values of selected column(s), like -
iris %>%
arrange(Sepal.Length)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
# After arranging the data-frame by Sepal.Length, for a distinct Sepal.Length, the Sepal.Width is arrange and so as the rest of the data-frame with it.
iris %>%
arrange(Sepal.Length,Sepal.Width)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
The distinct() verb retains only the unique/distinct
rows from a data-frame given the column(s) selected and returns only the
select column(s) (if not the .keep_all parameter is change
from it’s default value FALSE to TRUE). Let’s
see some examples -
iris %>% distinct(Sepal.Length)
| Sepal.Length |
|---|
| 5.1 |
| 4.9 |
| 4.7 |
| 4.6 |
| 5.0 |
| 5.4 |
| 4.4 |
| 4.8 |
| 4.3 |
| 5.8 |
| 5.7 |
| 5.2 |
| 5.5 |
| 4.5 |
| 5.3 |
| 7.0 |
| 6.4 |
| 6.9 |
| 6.5 |
| 6.3 |
| 6.6 |
| 5.9 |
| 6.0 |
| 6.1 |
| 5.6 |
| 6.7 |
| 6.2 |
| 6.8 |
| 7.1 |
| 7.6 |
| 7.3 |
| 7.2 |
| 7.7 |
| 7.4 |
| 7.9 |
# here only the unique combinations of Sepal.Length and Sepal.Width are kept.
iris %>% distinct(Sepal.Length,Sepal.Width)
| Sepal.Length | Sepal.Width |
|---|---|
| 5.1 | 3.5 |
| 4.9 | 3.0 |
| 4.7 | 3.2 |
| 4.6 | 3.1 |
| 5.0 | 3.6 |
| 5.4 | 3.9 |
| 4.6 | 3.4 |
| 5.0 | 3.4 |
| 4.4 | 2.9 |
| 4.9 | 3.1 |
| 5.4 | 3.7 |
| 4.8 | 3.4 |
| 4.8 | 3.0 |
| 4.3 | 3.0 |
| 5.8 | 4.0 |
| 5.7 | 4.4 |
| 5.7 | 3.8 |
| 5.1 | 3.8 |
| 5.4 | 3.4 |
| 5.1 | 3.7 |
| 4.6 | 3.6 |
| 5.1 | 3.3 |
| 5.0 | 3.0 |
| 5.2 | 3.5 |
| 5.2 | 3.4 |
| 4.8 | 3.1 |
| 5.2 | 4.1 |
| 5.5 | 4.2 |
| 5.0 | 3.2 |
| 5.5 | 3.5 |
| 4.9 | 3.6 |
| 4.4 | 3.0 |
| 5.1 | 3.4 |
| 5.0 | 3.5 |
| 4.5 | 2.3 |
| 4.4 | 3.2 |
| 4.6 | 3.2 |
| 5.3 | 3.7 |
| 5.0 | 3.3 |
| 7.0 | 3.2 |
| 6.4 | 3.2 |
| 6.9 | 3.1 |
| 5.5 | 2.3 |
| 6.5 | 2.8 |
| 5.7 | 2.8 |
| 6.3 | 3.3 |
| 4.9 | 2.4 |
| 6.6 | 2.9 |
| 5.2 | 2.7 |
| 5.0 | 2.0 |
| 5.9 | 3.0 |
| 6.0 | 2.2 |
| 6.1 | 2.9 |
| 5.6 | 2.9 |
| 6.7 | 3.1 |
| 5.6 | 3.0 |
| 5.8 | 2.7 |
| 6.2 | 2.2 |
| 5.6 | 2.5 |
| 5.9 | 3.2 |
| 6.1 | 2.8 |
| 6.3 | 2.5 |
| 6.4 | 2.9 |
| 6.6 | 3.0 |
| 6.8 | 2.8 |
| 6.7 | 3.0 |
| 6.0 | 2.9 |
| 5.7 | 2.6 |
| 5.5 | 2.4 |
| 6.0 | 2.7 |
| 5.4 | 3.0 |
| 6.0 | 3.4 |
| 6.3 | 2.3 |
| 5.5 | 2.5 |
| 5.5 | 2.6 |
| 6.1 | 3.0 |
| 5.8 | 2.6 |
| 5.0 | 2.3 |
| 5.6 | 2.7 |
| 5.7 | 3.0 |
| 5.7 | 2.9 |
| 6.2 | 2.9 |
| 5.1 | 2.5 |
| 7.1 | 3.0 |
| 6.3 | 2.9 |
| 6.5 | 3.0 |
| 7.6 | 3.0 |
| 4.9 | 2.5 |
| 7.3 | 2.9 |
| 6.7 | 2.5 |
| 7.2 | 3.6 |
| 6.5 | 3.2 |
| 6.4 | 2.7 |
| 6.8 | 3.0 |
| 5.7 | 2.5 |
| 5.8 | 2.8 |
| 7.7 | 3.8 |
| 7.7 | 2.6 |
| 6.9 | 3.2 |
| 5.6 | 2.8 |
| 7.7 | 2.8 |
| 6.3 | 2.7 |
| 6.7 | 3.3 |
| 7.2 | 3.2 |
| 6.2 | 2.8 |
| 6.4 | 2.8 |
| 7.2 | 3.0 |
| 7.4 | 2.8 |
| 7.9 | 3.8 |
| 6.3 | 2.8 |
| 6.1 | 2.6 |
| 7.7 | 3.0 |
| 6.3 | 3.4 |
| 6.4 | 3.1 |
| 6.0 | 3.0 |
| 6.8 | 3.2 |
| 6.2 | 3.4 |
# rest of the columns are also returned.
iris %>%
distinct(Sepal.Length,Sepal.Width, .keep_all = T)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
The slice() verb lets you index rows by their (integer)
locations. It has some helpers too -
slice_head() selects the first row, while
slice_tail() selects the last. The same can be done using
slice(1) and slice(n()).
slice_head(<int>) selects from the first to
the <int>th row, while
slice_tail(<int>) selects from
<int>th to the last row up to the end row.
slice_sample() selects rows at random.
slice_min() and slice_max() helper
selects rows with the lowest and the highest value of the selected
variable.
Few examples -
iris %>%
slice(1)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
iris %>%
slice(10:n())
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
iris %>%
slice_min( Sepal.Length)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 4.3 | 3 | 1.1 | 0.1 | setosa |
A disclaimer: there’s no verb (exactly) called
join() in dplyr (at least, to date). However,
there are two types of join verbs -
inner_join() and
outer_join (which is also not a verb, but a class of
three verbs):
left_join(),
right_join() and
full_join().
Join verbs joins columns from two different data-frames based on a common key column.
inner_join() verb joins two data-frame and retains the
rows where the keys match. This means that there is a potential loss of
observations that we may not appreciate in the real-life analysis.
On the other hand, if we have two data-frames x and
y, the left_join() verb matches the keys from
x and y, while keeps all the rows from
x and joins the matched rows (based on the key
column) from y. The empty cells (if any) are filled with
NA values. For right_join() verb, is the
opposite scenario. On the other hand, the full_join() verb
retains all the rows from both data-frames and empty cells are filled
with NA values. Let’s clear the concept with some examples
-
x <- iris %>%
select(Sepal.Length,Sepal.Width,Species) %>%
filter(Species %in% c("setosa", "versicolor")) %>%
slice_sample(n=10)
y <- iris %>%
select(Petal.Length,Petal.Width,Species) %>%
filter(Species %in% c("versicolor", "virginica")) %>%
slice_sample(n=10)
x %>%
inner_join(y, by = "Species")
| Sepal.Length | Sepal.Width | Species | Petal.Length | Petal.Width |
|---|---|---|---|---|
| 6.8 | 2.8 | versicolor | 4.7 | 1.4 |
| 6.8 | 2.8 | versicolor | 3.9 | 1.1 |
| 6.8 | 2.8 | versicolor | 4.5 | 1.5 |
| 6.8 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.7 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.7 | 2.8 | versicolor | 3.9 | 1.1 |
| 5.7 | 2.8 | versicolor | 4.5 | 1.5 |
| 5.7 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.8 | 2.6 | versicolor | 4.7 | 1.4 |
| 5.8 | 2.6 | versicolor | 3.9 | 1.1 |
| 5.8 | 2.6 | versicolor | 4.5 | 1.5 |
| 5.8 | 2.6 | versicolor | 4.7 | 1.4 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.4 |
| 6.0 | 2.7 | versicolor | 3.9 | 1.1 |
| 6.0 | 2.7 | versicolor | 4.5 | 1.5 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.4 |
| 6.2 | 2.9 | versicolor | 4.7 | 1.4 |
| 6.2 | 2.9 | versicolor | 3.9 | 1.1 |
| 6.2 | 2.9 | versicolor | 4.5 | 1.5 |
| 6.2 | 2.9 | versicolor | 4.7 | 1.4 |
x %>%
left_join(y, by = "Species")
| Sepal.Length | Sepal.Width | Species | Petal.Length | Petal.Width |
|---|---|---|---|---|
| 6.8 | 2.8 | versicolor | 4.7 | 1.4 |
| 6.8 | 2.8 | versicolor | 3.9 | 1.1 |
| 6.8 | 2.8 | versicolor | 4.5 | 1.5 |
| 6.8 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.0 | 3.3 | setosa | NA | NA |
| 5.7 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.7 | 2.8 | versicolor | 3.9 | 1.1 |
| 5.7 | 2.8 | versicolor | 4.5 | 1.5 |
| 5.7 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.8 | 2.6 | versicolor | 4.7 | 1.4 |
| 5.8 | 2.6 | versicolor | 3.9 | 1.1 |
| 5.8 | 2.6 | versicolor | 4.5 | 1.5 |
| 5.8 | 2.6 | versicolor | 4.7 | 1.4 |
| 4.3 | 3.0 | setosa | NA | NA |
| 6.0 | 2.7 | versicolor | 4.7 | 1.4 |
| 6.0 | 2.7 | versicolor | 3.9 | 1.1 |
| 6.0 | 2.7 | versicolor | 4.5 | 1.5 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.4 |
| 6.2 | 2.9 | versicolor | 4.7 | 1.4 |
| 6.2 | 2.9 | versicolor | 3.9 | 1.1 |
| 6.2 | 2.9 | versicolor | 4.5 | 1.5 |
| 6.2 | 2.9 | versicolor | 4.7 | 1.4 |
| 4.8 | 3.1 | setosa | NA | NA |
| 5.8 | 4.0 | setosa | NA | NA |
| 5.0 | 3.0 | setosa | NA | NA |
x %>%
right_join(y, by = "Species")
| Sepal.Length | Sepal.Width | Species | Petal.Length | Petal.Width |
|---|---|---|---|---|
| 6.8 | 2.8 | versicolor | 4.7 | 1.4 |
| 6.8 | 2.8 | versicolor | 3.9 | 1.1 |
| 6.8 | 2.8 | versicolor | 4.5 | 1.5 |
| 6.8 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.7 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.7 | 2.8 | versicolor | 3.9 | 1.1 |
| 5.7 | 2.8 | versicolor | 4.5 | 1.5 |
| 5.7 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.8 | 2.6 | versicolor | 4.7 | 1.4 |
| 5.8 | 2.6 | versicolor | 3.9 | 1.1 |
| 5.8 | 2.6 | versicolor | 4.5 | 1.5 |
| 5.8 | 2.6 | versicolor | 4.7 | 1.4 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.4 |
| 6.0 | 2.7 | versicolor | 3.9 | 1.1 |
| 6.0 | 2.7 | versicolor | 4.5 | 1.5 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.4 |
| 6.2 | 2.9 | versicolor | 4.7 | 1.4 |
| 6.2 | 2.9 | versicolor | 3.9 | 1.1 |
| 6.2 | 2.9 | versicolor | 4.5 | 1.5 |
| 6.2 | 2.9 | versicolor | 4.7 | 1.4 |
| NA | NA | virginica | 5.3 | 1.9 |
| NA | NA | virginica | 5.6 | 2.4 |
| NA | NA | virginica | 5.7 | 2.5 |
| NA | NA | virginica | 5.6 | 2.2 |
| NA | NA | virginica | 5.1 | 2.0 |
| NA | NA | virginica | 6.6 | 2.1 |
x %>%
full_join(y, by = "Species")
| Sepal.Length | Sepal.Width | Species | Petal.Length | Petal.Width |
|---|---|---|---|---|
| 6.8 | 2.8 | versicolor | 4.7 | 1.4 |
| 6.8 | 2.8 | versicolor | 3.9 | 1.1 |
| 6.8 | 2.8 | versicolor | 4.5 | 1.5 |
| 6.8 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.0 | 3.3 | setosa | NA | NA |
| 5.7 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.7 | 2.8 | versicolor | 3.9 | 1.1 |
| 5.7 | 2.8 | versicolor | 4.5 | 1.5 |
| 5.7 | 2.8 | versicolor | 4.7 | 1.4 |
| 5.8 | 2.6 | versicolor | 4.7 | 1.4 |
| 5.8 | 2.6 | versicolor | 3.9 | 1.1 |
| 5.8 | 2.6 | versicolor | 4.5 | 1.5 |
| 5.8 | 2.6 | versicolor | 4.7 | 1.4 |
| 4.3 | 3.0 | setosa | NA | NA |
| 6.0 | 2.7 | versicolor | 4.7 | 1.4 |
| 6.0 | 2.7 | versicolor | 3.9 | 1.1 |
| 6.0 | 2.7 | versicolor | 4.5 | 1.5 |
| 6.0 | 2.7 | versicolor | 4.7 | 1.4 |
| 6.2 | 2.9 | versicolor | 4.7 | 1.4 |
| 6.2 | 2.9 | versicolor | 3.9 | 1.1 |
| 6.2 | 2.9 | versicolor | 4.5 | 1.5 |
| 6.2 | 2.9 | versicolor | 4.7 | 1.4 |
| 4.8 | 3.1 | setosa | NA | NA |
| 5.8 | 4.0 | setosa | NA | NA |
| 5.0 | 3.0 | setosa | NA | NA |
| NA | NA | virginica | 5.3 | 1.9 |
| NA | NA | virginica | 5.6 | 2.4 |
| NA | NA | virginica | 5.7 | 2.5 |
| NA | NA | virginica | 5.6 | 2.2 |
| NA | NA | virginica | 5.1 | 2.0 |
| NA | NA | virginica | 6.6 | 2.1 |
I will be describing group_by() and
summarise() verbs together to show the effect of the
former. group_by() is the most important grouping verb in
dplyr. It takes one or more variables of the data-frame to
group by -
iris %>%
group_by(Species)
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 5.0 | 3.3 | 1.4 | 0.2 | setosa |
| 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
| 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 5.9 | 3.0 | 5.1 | 1.8 | virginica |
Rather than some messages on the R Console, you don’t
see any change in the structure of the iris data-frame yet. Let’s select
Sepal.Length and see the effect -
iris %>%
group_by(Species) %>%
select(Sepal.Length)
| Species | Sepal.Length |
|---|---|
| setosa | 5.1 |
| setosa | 4.9 |
| setosa | 4.7 |
| setosa | 4.6 |
| setosa | 5.0 |
| setosa | 5.4 |
| setosa | 4.6 |
| setosa | 5.0 |
| setosa | 4.4 |
| setosa | 4.9 |
| setosa | 5.4 |
| setosa | 4.8 |
| setosa | 4.8 |
| setosa | 4.3 |
| setosa | 5.8 |
| setosa | 5.7 |
| setosa | 5.4 |
| setosa | 5.1 |
| setosa | 5.7 |
| setosa | 5.1 |
| setosa | 5.4 |
| setosa | 5.1 |
| setosa | 4.6 |
| setosa | 5.1 |
| setosa | 4.8 |
| setosa | 5.0 |
| setosa | 5.0 |
| setosa | 5.2 |
| setosa | 5.2 |
| setosa | 4.7 |
| setosa | 4.8 |
| setosa | 5.4 |
| setosa | 5.2 |
| setosa | 5.5 |
| setosa | 4.9 |
| setosa | 5.0 |
| setosa | 5.5 |
| setosa | 4.9 |
| setosa | 4.4 |
| setosa | 5.1 |
| setosa | 5.0 |
| setosa | 4.5 |
| setosa | 4.4 |
| setosa | 5.0 |
| setosa | 5.1 |
| setosa | 4.8 |
| setosa | 5.1 |
| setosa | 4.6 |
| setosa | 5.3 |
| setosa | 5.0 |
| versicolor | 7.0 |
| versicolor | 6.4 |
| versicolor | 6.9 |
| versicolor | 5.5 |
| versicolor | 6.5 |
| versicolor | 5.7 |
| versicolor | 6.3 |
| versicolor | 4.9 |
| versicolor | 6.6 |
| versicolor | 5.2 |
| versicolor | 5.0 |
| versicolor | 5.9 |
| versicolor | 6.0 |
| versicolor | 6.1 |
| versicolor | 5.6 |
| versicolor | 6.7 |
| versicolor | 5.6 |
| versicolor | 5.8 |
| versicolor | 6.2 |
| versicolor | 5.6 |
| versicolor | 5.9 |
| versicolor | 6.1 |
| versicolor | 6.3 |
| versicolor | 6.1 |
| versicolor | 6.4 |
| versicolor | 6.6 |
| versicolor | 6.8 |
| versicolor | 6.7 |
| versicolor | 6.0 |
| versicolor | 5.7 |
| versicolor | 5.5 |
| versicolor | 5.5 |
| versicolor | 5.8 |
| versicolor | 6.0 |
| versicolor | 5.4 |
| versicolor | 6.0 |
| versicolor | 6.7 |
| versicolor | 6.3 |
| versicolor | 5.6 |
| versicolor | 5.5 |
| versicolor | 5.5 |
| versicolor | 6.1 |
| versicolor | 5.8 |
| versicolor | 5.0 |
| versicolor | 5.6 |
| versicolor | 5.7 |
| versicolor | 5.7 |
| versicolor | 6.2 |
| versicolor | 5.1 |
| versicolor | 5.7 |
| virginica | 6.3 |
| virginica | 5.8 |
| virginica | 7.1 |
| virginica | 6.3 |
| virginica | 6.5 |
| virginica | 7.6 |
| virginica | 4.9 |
| virginica | 7.3 |
| virginica | 6.7 |
| virginica | 7.2 |
| virginica | 6.5 |
| virginica | 6.4 |
| virginica | 6.8 |
| virginica | 5.7 |
| virginica | 5.8 |
| virginica | 6.4 |
| virginica | 6.5 |
| virginica | 7.7 |
| virginica | 7.7 |
| virginica | 6.0 |
| virginica | 6.9 |
| virginica | 5.6 |
| virginica | 7.7 |
| virginica | 6.3 |
| virginica | 6.7 |
| virginica | 7.2 |
| virginica | 6.2 |
| virginica | 6.1 |
| virginica | 6.4 |
| virginica | 7.2 |
| virginica | 7.4 |
| virginica | 7.9 |
| virginica | 6.4 |
| virginica | 6.3 |
| virginica | 6.1 |
| virginica | 7.7 |
| virginica | 6.3 |
| virginica | 6.4 |
| virginica | 6.0 |
| virginica | 6.9 |
| virginica | 6.7 |
| virginica | 6.9 |
| virginica | 5.8 |
| virginica | 6.8 |
| virginica | 6.7 |
| virginica | 6.7 |
| virginica | 6.3 |
| virginica | 6.5 |
| virginica | 6.2 |
| virginica | 5.9 |
Though I selected only the Sepal.Length, the
Species column also appears. Yes, that’s because we applied the
group_by() verb beforehand. But the most dramatic effect
can be seen in conjunction with the summarise() verb.
summarise() generates a new data-frame and returns one
row (with the result of course) for each combination of grouping
variables. In the case of no grouping variables, the output has a single
row summarising all observations in the input. Now, let’s see the effect
of group_by() in conjunction with summarise()
verb -
iris %>%
group_by(Species) %>%
select(Sepal.Length) %>%
summarise(count=n())
| Species | count |
|---|---|
| setosa | 50 |
| versicolor | 50 |
| virginica | 50 |
iris %>%
group_by(Species) %>%
select(Sepal.Length) %>%
summarise(mean_Sepal_length=mean(Sepal.Length))
| Species | mean_Sepal_length |
|---|---|
| setosa | 5.006 |
| versicolor | 5.936 |
| virginica | 6.588 |
# However, without any grouping -
iris %>%
select(Sepal.Length) %>%
summarise(mean_Sepal_length=mean(Sepal.Length))
| mean_Sepal_length |
|---|
| 5.843333 |
Now, it’s time for a mini exercise:
gapminder. You will find a
dataset called gapminder. For each continent, calculate the
mean of life expectancy at birth for people whose data were
collected after 2002 (not inclusive). The answer will look like below
-| continent | mean_LE |
|---|---|
| Oceania | 80.22975 |
| Europe | 77.17460 |
| Americas | 73.01508 |
| Asia | 69.98118 |
| Africa | 54.06563 |
life expectancy at birth. The result
will look like this -| country | mean_LE |
|---|---|
| Japan | 82.3015 |
| Hong Kong, China | 81.8515 |
| Switzerland | 81.1605 |
| Iceland | 81.1285 |
| Australia | 80.8025 |
| Sweden | 80.4620 |
| Italy | 80.3930 |
| Spain | 80.3605 |
| Israel | 80.2205 |
| Canada | 80.2115 |
ggplot2ggplot2To my opinion, the most elegant package for data visualisation in
R is ggplot2. Here, gg stands for the
grammar of graphics. Put aside what you have learnt so far on
basic R plotting techniques, ggplot2 defines the
art of plotting in a whole new way. The learning curve may be steep, but
once you learn it, you will fall in love with it (I promise). You
provide the data, tell ggplot2 which variables to map to
the aesthetics, and tell the plot type you want draw.
ggplot2 will take care of the rest.
The easiest way to get ggplot2 is to install the whole
tidyverse:
install.packages("tidyverse")
Alternatively, install just ggplot2:
install.packages("ggplot2")
Or the the development version from GitHub:
install.packages("devtools")
devtools::install_github("tidyverse/ggplot2")
And then, load it …
library(ggplot2)
In this chapter, I will be using the mtcars dataset for
plotting different graphs. For refreshing your memory, let’s have a look
at the dataset -
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Now, I will draw scatter plot, first using the base
R plot() function, and then using
ggplot2.
plot(x=mtcars$mpg, y=mtcars$wt)
ggplot(data = mtcars, mapping = aes(x=mpg,y=wt)) +
geom_point()
You can see the stark difference between them.
For plotting with ggplot2, you start with
ggplot() function and you privide the data. You then put
the parameters you need to plot, like - the aesthetic mapping using
mapping = aes(). Then, you add on layers (like
geom_point()), scale (like
scale_x_continuous()), faceting specifications (like
facet_wrap()), coordinate systems (like
coord_flip())
In short, these are the elements that you might see in a block of
graph using ggplot() function -
data
aesthetic mapping
geometric objects
statistical transformations
scales
coordinate systems
position adjustments
faceting
You can specify different layers of the plot and combine using “+”
operator. Now I will dive into different aspects of the
ggplot() function -
aes()Here aesthetic means something that you can see. It is mainly the mapping between a visual attribute and a variable. These are some important aesthetics -
position (x,y)
colour (basically the colour of the outer rim of the object)
fill (the filling-colour/inside-colour of the object)
shape (mainly of point)
line type
size etc
You can read all about them on your RStudio help panel by typing -
help.search("geom_", package = "ggplot2")
geom_There are so many geom objects in ggplot2,
like -
geom_point()
geom_lines()
geom_boxplot()
Again, you can find those geom objects by typing in
-
help.search("geom_", package = "ggplot2")
Now time to check what I have just mentioned, but before that (as
usual) let’s check the data that we are going to use. I will switch to
another dataset, called mpg, from R.
?mpg
I will now draw a scatter plot using highway miles per gallon as a function of engine displacement (in litres) -
ggplot(data=mpg, aes(x=displ, y=hwy)) +
geom_point()
Interestingly, you can save the whole or part of the code snippet in a variable -
# can be saved in a vector first, then print it. Like -
p1 <- ggplot(data=mpg, aes(x=displ, y=hwy)) + geom_point()
# now invoke it
p1
# or
p <- ggplot(data=mpg, aes(x=displ, y=hwy)) # saved as a base plot variable. I will call p and add different layer on it.
p2 <- p + geom_point()
p3 <- p + geom_line()
p4 <- p + geom_smooth()
p5 <- p2 + geom_smooth(se = F, linetype="dashed")
p5
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Now let’s play with colour and size -
p + geom_point(colour="red", alpha = 0.2, size = 3) # outside aes(), affects the same for all
p + geom_point(aes(colour=year, shape=factor(cyl)), size = 3) # inside aes(), affects accordingly
If you want to play with different shades of colours in your plots, This is a good place to start. The default colour scheme is not colour-blind friendly. You can even find a colour-blind-friendly colour palette following this link.
You can play with title and axis labels -
p +
geom_point(aes(colour=year), size = 3, alpha = 0.2) +
#geom_text(aes(label=model)) + # may be not a good idea now.
labs(
title = "Fuel efficiency vs Engine displacement",
subtitle = "Fuel efficiency decreases with the engine size",
caption = "Two-seater is an exception",
x = "Engine displacement (L)",
y = "Highway fule economy (mpg)",
colour = "Manufactrure year"
)
If your datapoints are a bit tightly spaced, you can jitter a bit -
p +
geom_point(aes(colour=class), size = 3, position = "jitter") # introducing jitter here. For controlling the amount of movements, you can use geom_jitter()
Let’s play with some scaling -
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
scale_x_continuous(name = "x-axis label changed", breaks = seq(0,10,by=5),limits = c(0,10)) +
scale_y_continuous(trans = "reverse")
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
scale_colour_brewer(palette = "Set1") # scale_colour is a widely used one
You can play with the positioning of the legend, too -
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
theme(legend.position = "left")
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
theme(legend.position = "none")
I will discuss it with box polt later in this chapter.
If you have too many data points, the idea of faceting is to sub-setting the plot by an appropriate variable -
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
facet_wrap(~ class, ncol = 2)
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
facet_grid(~ class) # if there were any blank plot, won't be plotted here
There are different themes to play with -
p +
geom_point(aes(colour=class), size = 3, alpha = 0.2) +
theme_void()
By default, the bar plot comes as stacked. If you fill it by a variable that is not used to plot the bars, you can see what I mean. However, for playing with the bar plot, I will be using another dataset called ‘diamonds’ that comes with R.
To begin with -
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut))
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=cut))
But -
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity))
The position is adjusted by the position argument which takes in three options - “identity”, “fill”, and “dodge”
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity), position = "identity")
Here, each object falls exactly where it should be in the context of the plot and seems to be overlapped. It can be a little better if you use fill = NA or use alpha value
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity), position = "identity", alpha = 0.2)
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, colour=clarity), position = "identity", fill=NA) # mind the change of colour and fill
Position fill catches up all the space vertically for each bar and displays as fraction of the values
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity), position = "fill")
But what we usually mean by the bar plots is the next -
ggplot(data=diamonds) +
geom_bar(mapping = aes(x=cut, fill=clarity), position = "dodge")
Box plot is very convenient to see the distribution of your data and compare side by side the distributions of different variables in your data -
ggplot(mpg, aes(class, hwy)) +
geom_boxplot() +
coord_flip()
ggplot(mpg, aes(class, hwy)) +
geom_boxplot() +
coord_polar()
# Please don't plot boxplot in this way in real-life.
Let’s re-construct this plot. There is an interesting reason behind my backward approach. Mentioning the dataset and variables, I asked ChatGPT to write a code snippet, and it did something close to what I wanted. Now, I want you to start from the beginning. Here are some info that you will need -
You will need the midwest dataset that comes with
the ggplot2 package.
Using geom_point() verb, draw scatter plot using the
variables area and poptotal.
Colour the points by state, and set the size of them
by variable popdensity.
Use geom_smooth() verb to visualise the relationship
between variables area and poptotal using
loess method. Get rid of the confidence interval around the
smooth line.
Adjust the x- and y-axis accordingly.
Annotate the plot accordingly.
Now it’s our turn to apply the techniques that we have learned so far in this workshop. In this section, we will explore some datasets that were part of a study characterising the genomic mutations (SNVs and CNAs) and gene expression profiles for over 2000 primary breast tumours. In addition, a detailed clinical information can also be found for this study alongside the experimental data from cBioPortal. The study was published under two prominent publications -
Curtis et al., Nature 486:346-52, 2012
Pereira et al., Nature Communications 7:11479, 2016
FYI, the gene expression data generated using microarrays, genome-wide copy number profiles were obtained using SNP microarrays and targeted sequencing was performed using a panel of 40 driver-mutation genes to detect mutations (single nucleotide variants).
Let’s download the data and save it in a folder (if you have not done it already). We will be plotting different aspects of the patient related information in our exploratory data analysis (EDA) workshop today. And for that, we will merge and format the data provided.
Now, let’s load the data one by one using the function
read.delim with appropriate parameters -
library(dplyr)
library(ggplot2)
# Load patient data and explore a few of the columns (e.g. BREAST_SURGERY, CELLULARITY,CHEMOTHERAPY, ER_IHC ) -
patient_data <- read.delim("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_clinical_patient.txt",comment.char = "#", sep = "\t")
patient_data %>% pull(BREAST_SURGERY) %>% table
## .
## BREAST CONSERVING MASTECTOMY
## 554 785 1170
patient_data %>% pull(CELLULARITY) %>% table
## .
## High Low Moderate
## 592 965 215 737
patient_data %>% pull(CHEMOTHERAPY) %>% table
## .
## NO YES
## 529 1568 412
patient_data %>% pull(ER_IHC) %>% table
## .
## Negative Positve
## 83 609 1817
# Load sample data and explore the ER_STATUS
sample_data <- read.delim("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_clinical_sample.txt",comment.char = "#", sep = "\t")
sample_data %>% pull(ER_STATUS) %>% table
## .
## Negative Positive
## 644 1825
# Load CNA data and explore
CNA_data <- read.table("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_cna.txt",header = T, sep = "\t") %>%
select(-Entrez_Gene_Id) %>%
distinct(Hugo_Symbol, .keep_all = T)
CNA_data[1:10, 1:10]
## Hugo_Symbol MB.0000 MB.0039 MB.0045 MB.0046 MB.0048 MB.0050 MB.0053 MB.0062
## 1 A1BG 0 0 -1 0 0 0 0 -1
## 2 A1BG-AS1 0 0 -1 0 0 0 0 -1
## 3 A1CF 0 0 0 0 1 0 0 0
## 4 A2M 0 0 -1 -1 0 0 0 2
## 5 A2M-AS1 0 0 -1 -1 0 0 0 2
## 6 A2ML1 0 0 -1 -1 0 0 0 2
## 7 A2MP1 0 0 -1 -1 0 0 0 2
## 8 A3GALT2 0 0 0 0 0 0 0 -1
## 9 A4GALT 0 0 0 -1 -1 -1 0 1
## 10 A4GNT 0 0 2 0 0 0 1 1
## MB.0064
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## 7 0
## 8 0
## 9 0
## 10 0
# Load mutation data and explore
mutation_data <- read.delim("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_mutations.txt",comment.char = "#", sep = "\t")
mutation_data %>% head()
## Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
## 1 TP53 NA METABRIC GRCh37 17 7579344
## 2 TP53 NA METABRIC GRCh37 17 7579346
## 3 MLLT4 NA METABRIC GRCh37 6 168299111
## 4 NF2 NA METABRIC GRCh37 22 29999995
## 5 SF3B1 NA METABRIC GRCh37 2 198288682
## 6 NT5E NA METABRIC GRCh37 6 86195125
## End_Position Strand Consequence Variant_Classification
## 1 7579345 + frameshift_variant Frame_Shift_Ins
## 2 7579347 + protein_altering_variant In_Frame_Ins
## 3 168299111 + missense_variant Missense_Mutation
## 4 29999995 + missense_variant Missense_Mutation
## 5 198288682 + synonymous_variant Silent
## 6 86195125 + synonymous_variant Silent
## Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS
## 1 INS - - G NA
## 2 INS - - CAG NA
## 3 SNP G G T NA
## 4 SNP G G T NA
## 5 SNP A A T NA
## 6 SNP T T C NA
## dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode
## 1 NA MTS-T0058 NA
## 2 NA MTS-T0058 NA
## 3 NA MTS-T0058 NA
## 4 NA MTS-T0058 NA
## 5 NA MTS-T0059 NA
## 6 NA MTS-T0059 NA
## Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## Tumor_Validation_Allele2 Match_Norm_Validation_Allele1
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## Match_Norm_Validation_Allele2 Verification_Status Validation_Status
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score
## 1 NA NA NA NA NA
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
## 6 NA NA NA NA NA
## BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count
## 1 NA Illumina HiSeq 2,000 NA NA NA NA
## 2 NA Illumina HiSeq 2,000 NA NA NA NA
## 3 NA Illumina HiSeq 2,000 NA NA NA NA
## 4 NA Illumina HiSeq 2,000 NA NA NA NA
## 5 NA Illumina HiSeq 2,000 NA NA NA NA
## 6 NA Illumina HiSeq 2,000 NA NA NA NA
## HGVSc HGVSp HGVSp_Short
## 1 ENST00000269305.4:c.343dup p.His115ProfsTer34 p.H115Pfs*34
## 2 ENST00000269305.4:c.340_341insCTG p.Leu114delinsSerVal p.L114delinsSV
## 3 ENST00000392108.3:c.1544G>T p.Gly515Val p.G515V
## 4 ENST00000338641.4:c.8G>T p.Gly3Val p.G3V
## 5 ENST00000335508.6:c.45T>A p.Ile15= p.I15=
## 6 ENST00000257770.3:c.924T>C p.Ile308= p.I308=
## Transcript_ID RefSeq Protein_position Codons Hotspot
## 1 ENST00000269305 NM_001126112.2 114 -/C 0
## 2 ENST00000269305 NM_001126112.2 114 ttg/tCTGtg 0
## 3 ENST00000392108 NM_001040000.2 515 gGa/gTa 0
## 4 ENST00000338641 NM_000268.3 3 gGg/gTg 0
## 5 ENST00000335508 NM_012433.2 15 atT/atA 0
## 6 ENST00000257770 NM_002526.3 308 atT/atC 0
# Load expression data and explore
expression_data <- read.delim("/Users/mahedi/Documents/Collaborations/UCL_CI/metabric/brca_metabric/data_mrna_agilent_microarray.txt",comment.char = "#", sep = "\t", header = T)
expression_data[1:10, 1:10]
## Hugo_Symbol Entrez_Gene_Id MB.0362 MB.0346 MB.0386 MB.0574 MB.0185
## 1 RERE 473 8.676978 9.653589 9.033589 8.814855 8.736406
## 2 RNF165 494470 6.075331 6.687887 5.910885 5.628740 6.392422
## 3 PHF7 51533 5.838270 5.600876 6.030718 5.849428 5.542133
## 4 CIDEA 1149 6.397503 5.246319 10.111816 6.116868 5.184098
## 5 TENT2 167153 7.906217 8.267256 7.959291 9.206376 8.162845
## 6 SLC17A3 10786 5.702379 5.521794 5.689533 5.439130 5.464326
## 7 SDS 10993 6.930741 6.141689 6.529312 6.430102 6.105427
## 8 ATP6V1C2 245973 5.332863 7.563477 5.482155 5.398675 5.026018
## 9 F3 2152 5.275676 5.376381 5.463788 5.409761 5.338580
## 10 FAM71C 196472 5.443896 5.319857 5.254294 5.512298 5.430874
## MB.0503 MB.0641 MB.0201
## 1 9.274265 9.286585 8.437347
## 2 5.908698 6.206729 6.095592
## 3 5.964661 5.783374 5.737572
## 4 7.828171 8.744149 5.480091
## 5 8.706646 8.518929 7.478413
## 6 5.417484 5.629885 5.686286
## 7 6.684893 5.632753 5.866132
## 8 5.266674 5.701353 6.403136
## 9 5.490693 5.363266 6.341856
## 10 5.363378 5.191612 5.208379
To begin with, let’s explore the mutation data a bit by
plotting the frequency of different types of mutations -
head(mutation_data)
## Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position
## 1 TP53 NA METABRIC GRCh37 17 7579344
## 2 TP53 NA METABRIC GRCh37 17 7579346
## 3 MLLT4 NA METABRIC GRCh37 6 168299111
## 4 NF2 NA METABRIC GRCh37 22 29999995
## 5 SF3B1 NA METABRIC GRCh37 2 198288682
## 6 NT5E NA METABRIC GRCh37 6 86195125
## End_Position Strand Consequence Variant_Classification
## 1 7579345 + frameshift_variant Frame_Shift_Ins
## 2 7579347 + protein_altering_variant In_Frame_Ins
## 3 168299111 + missense_variant Missense_Mutation
## 4 29999995 + missense_variant Missense_Mutation
## 5 198288682 + synonymous_variant Silent
## 6 86195125 + synonymous_variant Silent
## Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS
## 1 INS - - G NA
## 2 INS - - CAG NA
## 3 SNP G G T NA
## 4 SNP G G T NA
## 5 SNP A A T NA
## 6 SNP T T C NA
## dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode
## 1 NA MTS-T0058 NA
## 2 NA MTS-T0058 NA
## 3 NA MTS-T0058 NA
## 4 NA MTS-T0058 NA
## 5 NA MTS-T0059 NA
## 6 NA MTS-T0059 NA
## Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## Tumor_Validation_Allele2 Match_Norm_Validation_Allele1
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## Match_Norm_Validation_Allele2 Verification_Status Validation_Status
## 1 NA NA NA
## 2 NA NA NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score
## 1 NA NA NA NA NA
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
## 6 NA NA NA NA NA
## BAM_File Sequencer t_ref_count t_alt_count n_ref_count n_alt_count
## 1 NA Illumina HiSeq 2,000 NA NA NA NA
## 2 NA Illumina HiSeq 2,000 NA NA NA NA
## 3 NA Illumina HiSeq 2,000 NA NA NA NA
## 4 NA Illumina HiSeq 2,000 NA NA NA NA
## 5 NA Illumina HiSeq 2,000 NA NA NA NA
## 6 NA Illumina HiSeq 2,000 NA NA NA NA
## HGVSc HGVSp HGVSp_Short
## 1 ENST00000269305.4:c.343dup p.His115ProfsTer34 p.H115Pfs*34
## 2 ENST00000269305.4:c.340_341insCTG p.Leu114delinsSerVal p.L114delinsSV
## 3 ENST00000392108.3:c.1544G>T p.Gly515Val p.G515V
## 4 ENST00000338641.4:c.8G>T p.Gly3Val p.G3V
## 5 ENST00000335508.6:c.45T>A p.Ile15= p.I15=
## 6 ENST00000257770.3:c.924T>C p.Ile308= p.I308=
## Transcript_ID RefSeq Protein_position Codons Hotspot
## 1 ENST00000269305 NM_001126112.2 114 -/C 0
## 2 ENST00000269305 NM_001126112.2 114 ttg/tCTGtg 0
## 3 ENST00000392108 NM_001040000.2 515 gGa/gTa 0
## 4 ENST00000338641 NM_000268.3 3 gGg/gTg 0
## 5 ENST00000335508 NM_012433.2 15 atT/atA 0
## 6 ENST00000257770 NM_002526.3 308 atT/atC 0
ggplot(data=mutation_data,mapping = aes(Variant_Classification, fill=Variant_Classification)) +
geom_bar() +
coord_flip()
Now we will build a word cloud of genes that had been affected by mutations -
# install.packages("wordcloud")
library(wordcloud)
## Loading required package: RColorBrewer
# We need the gene name and how many times they are affected by any non-synonymous mutation -
mutation_wordcloud_data <- mutation_data %>%
filter(Consequence != "synonymous_variant") %>%
group_by(Hugo_Symbol) %>%
summarise(freq=n()) %>%
rename(word=Hugo_Symbol)
mutation_wordcloud_data %>% head
## # A tibble: 6 × 2
## word freq
## <chr> <int>
## 1 ACVRL1 13
## 2 AFF2 44
## 3 AGMO 32
## 4 AGTR2 14
## 5 AHNAK 246
## 6 AHNAK2 537
# Let's find out some highly affected genes -
ggplot(mutation_wordcloud_data %>% filter(freq > 100)) +
geom_col(aes(word, freq)) +
coord_flip()
# Now create the word cloud
wordcloud(word=mutation_wordcloud_data %>% pull(word),
freq = mutation_wordcloud_data %>% pull(freq),
scale=c(5,0.5), # Set min and max scale
max.words=100, # Set top n words
random.order=FALSE, # Words in decreasing freq
rot.per=0.35, # % of vertical words
use.r.layout=T, # Use C++ collision detection
colors=brewer.pal(8, "Dark2"))
Now, we will subset the loaded data so that we can merge (or join) them together later. We will create new dataset containing -
Frequency of mutations per patient from
mutation_data.
Expression data for selected (but important) genes:
"GATA3","FOXA1","MLPH","ESR1","ERBB2","PGR","TP53","PIK3CA", "AKT1", "PTEN", "PIK3R1", "FOXO3","RB1", "KMT2C", "ARID1A", "NCOR1","CTCF","MAP3K1","NF1","CDH1","TBX3","CBFB","RUNX1", "USP9X","SF3B1"
Sub-setting sample_data using selected columns:
PATIENT_ID, SAMPLE_ID, ER_STATUS, HER2_STATUS, PR_STATUS,GRADE.
Sub-setting patient_data using selected columns:
PATIENT_ID, THREEGENE, AGE_AT_DIAGNOSIS, CELLULARITY, CHEMOTHERAPY, ER_IHC, HORMONE_THERAPY, INTCLUST, NPI, CLAUDIN_SUBTYPE.
And, we will combine all the data based on the
patient_ID to create a master dataset that we will use in
the rest of the worshop.
# Find out the frequency of mutations per patient
mutation_per_patient <- mutation_data %>%
filter(Consequence != "synonymous_variant") %>%
pull(Tumor_Sample_Barcode) %>%
table() %>%
data.frame() %>%
select(patient_ID = ".", Mutation_count=Freq)
# subsetting and formatting the expression data
sub_expression_data <- expression_data %>%
filter(Hugo_Symbol %in% c("GATA3","FOXA1","MLPH","ESR1","ERBB2","PGR","TP53","PIK3CA",
"AKT1", "PTEN", "PIK3R1", "FOXO3","RB1", "KMT2C", "ARID1A",
"NCOR1","CTCF","MAP3K1","NF1","CDH1","TBX3","CBFB","RUNX1",
"USP9X","SF3B1"))
rm(expression_data)
rownames(sub_expression_data) <- sub_expression_data$Hugo_Symbol
sub_expression_data <- sub_expression_data %>%
select(-Hugo_Symbol,-Entrez_Gene_Id) %>%
t() %>%
data.frame() %>%
mutate(patient_ID = rownames(.))
# subsetting the sample_data
sub_sample_data <- sample_data %>%
select(patient_ID = PATIENT_ID,
sample_ID = SAMPLE_ID,
cancer_type = CANCER_TYPE,
cancer_type_detailed = CANCER_TYPE_DETAILED,
ER_status = ER_STATUS,
HER2_status = HER2_STATUS,
PR_status = PR_STATUS,
Neoplasm_Histologic_Grade = GRADE)
rm(sample_data)
# subsetting the patient data
sub_patient_data <- patient_data %>%
select(patient_ID = PATIENT_ID,
Three_gene_classifier_subtype = THREEGENE,
Age_at_diagnosis = AGE_AT_DIAGNOSIS,
Cellularity = CELLULARITY,
Chemotherapy = CHEMOTHERAPY,
ER_status_measured_by_IHC = ER_IHC,
Hormone_therapy = HORMONE_THERAPY,
Integrative_cluster = INTCLUST,
Nottingham_prognostic_index = NPI,
PAM50 = CLAUDIN_SUBTYPE)
# let's combine the dataset
combined_data <- left_join(sub_patient_data,sub_sample_data, by="patient_ID")
combined_data <- left_join(combined_data, mutation_per_patient, by="patient_ID")
combined_data$patient_ID <- gsub("-",".",combined_data$patient_ID) # replace the '-' sign to '.' in the patient_ID column
combined_data <- left_join(combined_data,sub_expression_data, by="patient_ID")
Now, we will generate a scatter plot using the expression data of
Estrogen receptor ESR1 against that of transcription factor
GATA3. Then we will build our understanding of their
co-expression by building a linear model (on the plot, of course). We
will then refine that based on the ER_status (positive or negative)
-
ggplot(data = combined_data) +
geom_point(mapping = aes(x = GATA3, y = ESR1))
## Warning: Removed 529 rows containing missing values (`geom_point()`).
ggplot(data = combined_data %>% na.omit(), aes(x = GATA3, y = ESR1)) +
geom_point() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
ggplot(data = combined_data %>% na.omit()) +
geom_point(mapping = aes(x = GATA3, y = ESR1, colour = ER_status))
ggplot(data = combined_data %>% na.omit(), aes(x = GATA3, y = ESR1, colour = ER_status)) +
geom_point() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
On a different note, GATA3 expression is ususally high
in Luminal A subtype of breast cancer and also in tumour with positive
estrogen receptor (ER+) status (Voduc D et.
al.). Let’s find out if that’s try for this study -
# GATA3 expression in PAM50 classified tumour types-
ggplot(combined_data, aes(PAM50, GATA3)) +
geom_boxplot()
## Warning: Removed 529 rows containing non-finite values (`stat_boxplot()`).
# GATA3 expression in tumour with different ER status (positive and negative)-
ggplot(combined_data %>% na.omit(), aes(ER_status, GATA3)) +
geom_boxplot()
ggplot(combined_data %>% na.omit(), aes(ER_status, GATA3)) +
geom_violin(aes(fill=ER_status))
Now, we will look at the distribution of age of the patients at diagnosis as a function of some selected mutated genes.
mut_gene <- mutation_data %>%
filter(Consequence != "synonymous_variant") %>%
select(gene=Hugo_Symbol,patient_ID=Tumor_Sample_Barcode )
patient_age <- patient_data %>% select(age=AGE_AT_DIAGNOSIS,patient_ID=PATIENT_ID)
plot_data <- left_join(mut_gene,patient_age,by="patient_ID") %>%
filter(gene %in% c("PIK3CA", "TP53", "GATA3", "CDH1", "MAP3K1", "CBFB", "SF3B1")) %>%
mutate(age_cat = case_when(age < 45 ~ "<45",
age >= 45 & age <= 54 ~ "45-54",
age >= 55 & age <= 64 ~ "55-64",
age > 64 ~ ">64",)) %>%
na.omit()
plot_data$age_cat <- factor(plot_data$age_cat, ordered = T, levels = c(">64","55-64","45-54","<45"))
plot_data %>%
group_by(gene,age_cat) %>%
select(gene,age_cat) %>%
summarise(freq=n()) %>%
ggplot() +
geom_col(aes(gene,freq, fill=age_cat), position="fill", colour="black") +
scale_fill_manual(values=c("#568a48","#6fad76","#aac987","#e6ede3")) +
theme_classic()
## `summarise()` has grouped output by 'gene'. You can override using the
## `.groups` argument.
Can we distinguish any pattern from the plot?
Now, we will try to explore patterns of co-occurring mutations and mutual exclusivity in a set of 21 driver genes (so-called Mut-driver genes) -
#install.packages("splitstackshape")
#install.packages("reshape2")
library(splitstackshape)
library(reshape2)
# create a matrix for the combination of the frequency of mutated genes and each patient
mat <- t(splitstackshape:::charMat(listOfValues = split( mut_gene$gene,mut_gene$patient_ID), fill = 0L))
# set of 21 Mut-driver genes
mat_gene <- c("PIK3CA","AKT1","PTEN","PIK3R1","FOXO3", "RB1", "KMT2C", "ARID1A","NCOR1","CTCF", "TP53", "MAP3K1", "NF1","CDH1","GATA3","TBX3","CBFB","RUNX1","ERBB2","USP9X","SF3B1")
# create an empty matrix
mat_asso <- matrix(data=NA, nrow = length(mat_gene), ncol = length(mat_gene))
colnames(mat_asso) <- mat_gene
rownames(mat_asso) <- mat_gene
# fill in the cells with log odds ratio for each pairwise association test
for(i in mat_gene){
for(j in mat_gene){
mat_asso[i,j] <- fisher.test(mat[i,],mat[j,])$estimate %>% log()
}
}
# get rid of a triangular half of the matrix
mat_asso[upper.tri(mat_asso, diag = T)] <- 0
ggplot(melt(mat_asso), aes(Var1,Var2)) +
geom_tile(aes(fill=value), colour="white") +
scale_fill_gradient2(low = "#7c4d91", high = "#5e8761",mid = "white", limits = c(-2,2)) +
labs(title = "Patterns of association between somatic events",
caption = "Purple squares represent negative associations (mutually exclusive mutations).\nGreen squares represent positively associated events (co-mutation).\nThe colour scale represents the magnitude of the association (log odds)",
x="",
y="",
fill= "Log odds")+
theme_classic() +
coord_flip() +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.line.x = element_blank(),
axis.line.y = element_blank())